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ABSTRACT 


The  problem  of  estimating  state  variables  and  parameters  of 
dynamic  systems  in  the  presence  of  random  disturbances  and  measurement 
noise  is  examined  for  a  wide  range  of  dynamic  systems,  including  linear 
and  nonlinear  systems  in  discrete-  and  con tinuous- time „  This  is  a  funda- 
mental statistical  problem  arising  in9  but  by  no  means  limited  to„  the 
areas  of  adaptive  and  optimum  control,,  Parameter  estimation  is  treated 
as  a  special  case  of  state  variable  estimation. 

The  general  solution  of  the  linear  problem  is  given „  Approxi- 
mation techniques  are  developed  for  the  nonlinear  problem  and  the  results 
of  simulation  studies  demonstrating  the  application  of  these  techniques 
are  presented0 

A  dynamic  programming  formulation  of  the  estimation  problem  is 
developed „ 

A  discussion  of  the  concepts  of  controllability  and  observability 
is  given  in  which  particular  attention  is  paid  to  the  problems  peculiar 
to  discrete°»time  systems 0 

Background  material  on  matrix  identities ,  Gaussian  random  vectors t 
the  generalized  inverse  of  a  matrix,  and  probability  density  functionals 
is  included  in  a  series  of  appendices 0 


Thesis  Supervisor;    Ronald  A0  Howard 
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INTRODUCTION 

The  introduction  of  statistical  considerations  into  a  wide  range 
of  engineering  problems  is  a  logical  consequence  of  the  existence  of  un- 
certainties of  one  type  or  another  which  frequently  make  a  purely  deter- 
ministic approach  to  these  problems  unrealistic  The  field  of  automatic 
control  abounds  in  these  uncertainties,  such  as  those  arising  from  ran- 
dom disturbances  j,  measurement  noise  and  changing  environment.  Often  un- 
certainties about  the  process  or  plant  being  controlled  and  about  future 
commands  are  also  present,,  Hence,  it  is  not  surprising  that  statistical 
methods  have  contributed  significantly  to  the  development  of  modern  con- 
trol theory. 

A  fundamental  problem  in  the  areas  of  adaptive  and  optimum  control 
is  that  of  estimation  of  state  variables  and  parameters  of  a  dynamic  sys- 
tem when  random  disturbances  and  measurement  noise  are  present,,  If  the 
estimates  of  these  quantities  are  to  be  used  for  control  purposes,  then 
the  estimation  problem  must  be  solved  in  real  time0  The  problem  then  is 
to  synthesize  an  estimator  which  will  act  on  all  available  data  and  con- 
tinually produce  up-to-date  estimates  of  the  quantities  of  interest,,  In 
related  problems  of  ex  post  facto  data  analysis  the  severe  real  time  re- 
quirement is  relaxed  and,  at  the  expense  of  computation  time,  priority 
is  given  to  processing  the  often  limited  amount  of  data  available  in  an 
optimal  manner o 

This  report  is  devoted  to  the  examination  of  this  estimation  prob- 
lem for  a  wide  range  of  dynamic  systems »  Chapter  I  presents  the  mathe- 
matical model  to  be  used  in  the  continuous-time  version  of  the  problem 
and  the  generality  of  the  model  and  its  relation  to  other  problems  is 
discussed.  Because  the  analogous  discrete- time  problem  is  conceptually 


simpler,  we  first  examine  it  in  detail  in  Chapter  II  before  returning 
to  the  continuous-time  problem  in  Chapter  III.  Chapter  IV  discusses 
the  concepts  of  controllability  and  observability.  Chapter  V  gives 
the  results  of  some  computational  studies  $  while  several  concluding  re- 
marks and  suggestions  for  possible  further  research  are  contained  in 
Chapter  VI „  Material  of  a  complementary  and  supplementary  nature  is 
included  in  a  series  of  appendices. 

The  principal  references  for  this  report  are  the  work  of  Kalman 
[29.  31,  32 ,  34]  and  the  paper  by  Bryson  and  Frazier  [10],  An  effort 
has  been  made  to  use  a  notation  which  is  compatible  with  these  references, 


CHAPTER  I 

BACKGROUND  AND  PERSPECTIVE 

1 8 1   Introduction 

The  purpose  of  this  chapter  is  to  lay  the  groundwork  for  the 
analysis  of  the  estimation  problem  and  to  discuss  various  aspects  of 
the  mathematical  models  considered  in  the  sequel „  After  taking  care 
of  some  notational  preliminaries  and  reviewing  briefly  the  state 
approach  to  dynamic  systems „  the  continuous-time  model  is  presented,, 
This  model  is  then  related  to  control  problems  and  to  classical  linear 
filtering  theorye  The  author  chose  the  continuous-time  model  for  this 
preliminary  discussion  because  it  is  easily  related  to  the  model 
appearing  most  frequently  in  modern  control  theory,, 

1 02   Notation 

For  simplicity  vector  notation  is  used  throughout  this  worko 
All  vectors  are  column  vectors.  Vectors  are  designated  by  under- 
lined lower  case  letters  and  matrices  by  underlined  upper  case 
letters.  For  example,  the  vector  x  and  the  matrix  A  are  to  be  in- 
terpreted as  follows? 


x1 

x2 


*n 


A     = 


a11      a12 


ml       m2 


1n 


a21     a22  a2n 


mn 


The  transpose  of  x  and  A  are  denoted  by  x'  and  A'  respectively.  The 
inner  product  of  two  vectors  x  and  y_  is  written  as  a  matrix  product «, 
That  is, 

7 


The  Euclidian  norm  of  a  vector  x  is  denoted  by  II  x 


=     \Zx«x      =       ]/: 


'     +     ooo     +    X~ 


Also 


=    x»x 


E 


2 

x 

i       i 


If  A  is  a  positive  definite  matrix ,  the  following  special  notation  is 
used  for  the  associated  quadratic  form 


2  II   =  S'^S  =   ^— •  a  x  x 
A  ij   ij  i  j 


If  x  is  a  function  of  time,  then  x(t)  is  the  derivative  of  x  with 
respect  to  time0  That  is, 


x(t)  =  dx(t)/dt 


dx1  (t)/dt 
dx  (t)/dt 


_dxn(t)/dt 


The  scalar  function  V(x1  (t), ».» 9x  (t)|t)  is  written  V(x(t),t)„ 
The  gradient  of  V  is  a  vector  denoted  by  V  .  The  partial  derivative  of 


V  with  respect  to  t  is  written  V. ,,  That  is, 


V  =  grad  V  = 

2* 


3V/  ax1 


3  V/  3x 


n 


V   =  -aV(x(t),t)/at 


Vector  valued  functions  are  designated  by  underlined  lower  case 
letters 0  For  example. 


f(x,u,t)  = 


1   1  *  °  *  *  '  ^Si  9  ^1  '  °  *  °  '  ^r> ' 


fn<*1 


90005  X  p  U^|  90005 


v*> 


We  use  (3|(x  )/ax)  to  designate  the  Jacobian  matrix  of  partial 


— o 


derivatives  [3f,/ax.  J  evaluated  at  x  =  Xq, 

The  segment  of  the  time-function  u(t)  on  the  closed  interval 
t  -  t  -  t  is  designated  by  u 


1 


1  „3       Equations  of  State 


[to.ti]  • 


The  convention;  of  describing  a  dynamic  system  by  a  set  of  first 
order  ordinary  differential (or  difference)  equations  or,  more  compactly, 
by  a  vector  differential  (or  difference)  equation  is  already  well  estab- 
lished in  the  control  literature 0  Introductory  accounts  of  "state-space 
techniques",  the  name  usually  given  to  a  body  of  concepts  associated 
with  this  characterization,  are  given  in  references  [ 7 » 1 9 9 68 ] e  However, 
since  vector  notation  will  be  used  throughout  this  report,  it  is  appro- 
priate to  review  briefly  some  of  the  basic  ideas  and  relations „ 


Informally,  we  may  say  that  the  future  behavior  of  a  system 
depends  upon  its  present  state  and  any  forcing  functions  or  inputs  which 
may  be  applied  to  the  system  in  the  future.  We  limit  our  discussion  to 
systems  in  which  the  minimum  number  of  quantities  necessary  to  specify 
the  state  of  the  system  at  time  t  is  finite „  The  members  Xj  (t),„00 ,x  (t) 
of  such  a  minimum  finite  set  are  called  state  variables.  The  state  vari- 
ables are  elements  of  the  state  vector  x(t)  which  ranges  over  a  set  X, 
called  the  state  space „  Examples  of  appropriate  sets  of  state  variables 
are  the  amount  of  charge  on  each  capacitor  and  the  amount  of  current  in 
each  inductor  in  an  electric  circuit,  the  position  and  momentum  of  each 
mass  of  a  mechanical  system  and  the  output  of  each  integrator  of  an 
analogue  computer  representation  of  a  system* 

The  equations  of  state  for  a  continuous-time  system  are  usually 
written  as  a  system  of  first  order  ordinary  differential  equations  of  the 
form 

o 

x. (t)  =  f  (x1(t),..o,x  (t),u  (t),...,u  (t)„t)     i  =  1,...,n 
■*■      i  •       n    i       r 

where  u..  „  «,<,„„ u  are  the  inputs  of  the  system,,  The  equivalent  vector 
equation  is 

x(t)  =  f(x(t),u(t),t)  (1.1) 

The  analogous  equations  for  a  discrete- time  system  are 

X.  ^K+1  )    —    X  o  ^X.«  Uj|»»c  sX_^K/  dUj  { l£)  „  o  o  o  »U    (  K)  „K^  2.   -■    lgooo»n 

and 

x(k+1)  =  f(x(k)9u(k),k)  (1.2) 


While  equations  (1.1)  and  (1  .2)  encompass  a  very  large  class  of  systems, 
it  is  well  to  remember  that  they  do  not  include  distributed  parameter 
systems  described  by  partial  differential  equations,  systems  with  time 
delays  described  by  differential-difference  equations  and  other  systems 
of  more  complicated  types 0 

As  an  example  of  the  reduction  of  an  nth  order  nonlinear  differential 
equation  to  a  system  of  n  first  order  differential  equations,  consider 
the  equation 

y(n)(t)  +  g(y9y,ooo,y(n~1)0u0t)  =  o 

Let  Xj  =  y„  X£  =  y9  „  „  .  9   xn  =  yK   "  '.  Then  an  equivalent  system  of 
first  order  equations  is 

e 

x1  (t)  =  XgCt) 

o 

x2(t)  =  x_(t) 


c 

^(w  =~g(x^ , o o 0 „xn9u,t^ 

Important  special  cases  of  (1.1)  and  (1.2)  arise  when  the  system 
is  linear.  For  a  linear  system  (1e1)  becomes 

x(t)  =  F(t)x(t)  +  G(t)u(t)  (1.3) 


One  can  easily  verify  that  the  solution  of  (1 .3)  has  the  following  form; 

't 
t. 


x(t)  =  |(t,t  )x(t0)  +  J        i(t,s)G(s)u(s)ds 


where  £(tptQ)  is  the  transition  or  fundamental  matrix  of  the  system  (1.3). 
The  transition  matrix  is  the  unique  matrix  satisfying  the  following  rela- 
tions; 


i(t  .t  )  =  I 
— v  o9  o'   - 

where  I  is  the  identity  matrix.  For  the  special  case  of  time -invariant 
linear  systems „  F  is  a  constant  matrix  and  the  transition  matrix  is  given 
by 


i(t,t0)  =  exp  [  (t-tjF  ]   =   Z,   (t"to)n^n 


n=o 


n! 


As  an  example  of  the  reduction  of  a  linear  differential  equation 
to  a  system  of  first  order  equations  of  the  form  of  (1.3)»  consider  the 
third  order  time-invariant  equation 

y  (t)  +  a2  y(t)  +  a1  y(t)  +  aQ  y(t)  =  t>2u(t)  +  b.,u(t)  +  bQu(t) 

Let  x. (t)  =  y(t)0  Then  an  equivalent  set  of  first  order  equations  written 
in  vector  form  is 


dt 


"^(t)" 

""2     1     ° 

X,(t) 

x2(t) 

= 

-a,     0     1 

^(t) 

x^t) 

_   _0    0    0  _ 

x3(t) 

Lbo 


The  linear  version  of  the  dis ere te=> time  system  (1 02)  is 


u(t) 


x(k+1 )  =  F(k)x(k)  +  G(k)u(k) 


(1o4) 


The  solution  of  the  discrete-time  linear  system  takes  the  following  form; 

n-1 
x(n)  =  i(n,k)x(k)  +     ]T    §(n,j+1)G(j)u(j)  n>  k 


where 

£(n9k)  =  F(n-1)  £(n-1  ,k)  =  F(n-1 )   •••  F(k)  ,  n>k 

£ (kfk)  =  I 

Procedures  for  reducing  difference  equations  to  a  set  of  first  order 
difference  equations  are  analogous  with  the  ones  illustrated  for  differ- 
ential equations c  For  a  more  detailed  discussion  of  linear  systems  from 
the  state  point  of  view  see  [33]° 

1  A      Mathematical  Model 

1 ,41   Statement  of  the  Continuous -Time  Problem 

We  begin  our  discussion  with  a  mathematical  statement  of  the 
continuous-time  estimation  problem,,  The  motivation  for  the  mathematical 
model  used  will  become  apparent  in  the  discussion  immediately  following. 
The  reader  may  wish  to  read  over  this  section  lightly  at  first,  returning 
after  the  relation  of  the  model  to  various  problems  is  developed. 

The  problem  to  be  considered  is  the  estimation  of  the  state 
variables  of  a  nonlinear  dynamic  system  which  is  excited  by  white 
Gaussian  noise.  It  is  assumed  that  a  nonlinear  combination  of  the  state 
variables  corrupted  by  additive  white  Gaussian  noise  may  be  observed 
over  a  finite  time  interval. 

In  particular,  we  consider  processes  which  may  be  described  by  a 


system  of  nonlinear  differential  equations  which  can  be  written  in 
vector  notation  as 

x  =  f(x(t),t)  +  G(t)w(t)  (1.5) 

The  observed  signal  is 

z(t)  =  h(x(t),t)  +  v(t)  (1.6) 

where ; 

x  is  an  n- vector,  the  state  vector  of  the  system 

w  is  an  m- vector,  m  -  n,  the  random  input 

£  is  a  p» vector ,  the  observation 

G  is  an  n  x  m  matrix 

v  is  a  p~  vector,  the  measurement  noise 

The  functions  w  and  v  are  white  Gaussian  noise  processes  with  zero 
means  and  the  following  covariance  matrices; 

E[w(t)w«(t)l  =  fi(t)  £(t-T) 

E[v(t)v'(r)]  =  R(t)  S(Uy)        \         for  all  t  and  T 

E  [w(t)v'CT)]  =  0 

where  E  is  the  expectation  operator,  S  is  the  Dirac  delta  function, 
and  £(t)  and  R(t)  are  positive  definite  symmetric  matrices  which  are 
continuously  differentiable  in  t0  It  will  be  necessary  to  assume  that 
f  and  h  satisfy  certain  differentiability  conditions  with  respect  to 
their  arguments „ 

This  model  may  be  represented  by  the  block  diagram  of  Fig  0 1 . 
The  wide  arrows  are  used  to  indicate  the  flow  of  vector  quantities. 


,10- 


The  output  of  an  oval  block  is  obtained  by  the  indicated  nonlinear 
operation  on  its  input. 


measurement  noise  v(t) 


random 
input 


observations 


£(£<*).*)  k^z. 


Figc  1 •   Model  of  the  Continuous»Time  System 

Observations  begin  at  an  initial  time  tQ.  The  problem  of  real 

time  estimation  of  state  variables  is  to  use  the  history  of  observations 

from  t  up  to  the  present  time  t  (i0e„9  z  r,   ,-,)  to  continually  produce 

up-to-date  estimates  of  the  present  state  x(t). 

It  will  be  convenient  to  imbed  this  problem  in  the  more  general 

problem  of  estimating  the  entire  segment  xr       ,,  This  point  will 

""|t0#t  +  TJ 

be  discussed  in  subsequent  chapters 0 

1  AZ      Relation  to  Optimum  Control  Problems 

The  so-called  optimum  control  problem  is  usually  formulated 
in  the  following  manner 0  The  physical  process  to  be  controlled  is 
represented  by  a  system  of  ordinary  differential  equations  of  the  form 


dx^t^dt  =  fi(x1,...,xn,u1,...,ur,t) 


j 1  —   I |«gg pn 


where  x^ , . „ „ 9xn  are  the  state  variables  of  the  process ,  u^ „ . . . ,ur 
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are  the  control  variables  and  t  is  time.  These  equations  are  usually- 
written  in  vector  notation  as 

x(t)  =  f(x(t),u(t),t) 

The  control  problem  is  that  of  choosing  u(t)  from  a  set  of 
admissible  controls  so  that  the  state  of  the  system  follows  an  optimal 
trajectory,,  Optimality  is  usually  defined  as  the  minimization  of  a 
scalar  performance  index  which  may  in  general  depend  on  u,  x  and  t. 
For  example ,  if  it  were  desired  to  move  the  system  from  an  initial  state 
xto  a  specified  final  state  x,  ,  the  performance  index  might  include 
time  of  transit9  energy  expended,  or  a  combination  of  these  and  other 
criteria. 

The  practical  considerations  of  model  inaccuracies  and  external 
disturbances  make  it  desirable  to  have  a  closed  loop  system,  that  is, 
one  in  which  the  control  u(t)  is  specified  as  a  function  of  the  state 
x(t)  rather  than  as  a  preprogrammed  function  of  time. 

Despite  its  short  history,  the  literature  devoted  to  various 
aspects  of  this  problem  is  voluminous.  Various  mathematical  techniques 
including  the  calculus  of  variations  [l 4,1 8,36]  ,  the  "Maximum  Principle" 
of  Pontryagin  [52  9  55  9  41  ]  ,  Bellman "s  dynamic  programming  [3»5J6]  »  a^d 
steepest  descent  [9»35]  have  been  used  in  connection  with  this  class  of 
problems.  In  almost  all  of  this  work  it  is  necessary  to  assume  that  the 
state  variables  of  the  system  are  observable  or  exactly  measurable  or 
may  be  calculated  instantaneously  from  other  observable  quantities.  Be- 
cause of  external  disturbances,  instrument  noise  and  the  intrinsic  diffi- 
culty of  physically  performing  the  unbounded  operation  of  differentiation, 
the  assumption  that  the  state  variables  of  the  system  are  known  exactly 
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is  almost  never  justified 0 

It  is  known  that  for  linear  systems  with  quadratic  performance 
criteria  it  is  possible  to  solve  the  estimation  problem  and  the  optimi- 
zation problem  separately  and  still  obtain  the  over-all  optimum  system 
(see  [23]  or  [28 J) «  No  analogous  result  can  be  anticipated  for  nonlinear 
systems 5  that  is,  when  estimates  are  used  in  place  of  the  actual  values 
in  nonlinear  systems  we  cannot  assume  that  the  over-all  system  will  still 
be  optimal „  From  the  viewpoint  of  application,  however,  if  estimates  of 
state  variables  were  available,  one  would  have  no  alternative  but  to  use 
them,  since  the  problem  of  joint  estimation  and  optimization  for  a  non- 
linear system  belongs  to  a  class  of  extremely  difficult  unsolved  statis- 
tical optimization  problems „  Indeed,  either  the  estimation  problem  or 
the  optimization  problem  alone  is  extremely  formidable „ 

A  diagram  of  the  over-all  control  system  using  the  estimates  of 
the  state  variables  is  shown  in  Fig.  2C  The  estimator  operates  on  all 
observable  quantities,  in  this  case  u  and  z9   to  produce  an  estimate  of 
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5(t) 


control  input 
H(t)    . 


it 


estimator   <1 


random 
disturbances 


AZ 


x(t) 

process 


measurements 
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":C 


Fig„  2„   Block  diagram  of  over-all  control  system 
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x  which  it  feeds  to  the  controller.  The  controller  produces  u  which 
acts  on  the  process.  The  value  of  u  is  also  fed  to  the  estimator. 
The  mathematical  model  for  this  noisy  system  is 

x(t)  =  f(x(t),u(t),t)+G(t)w(t) 

z(t)  =  h(x(t),t)  +  v(t) 

This  is  identical  with  the  model  of  (1.5)  and  (1.6)  except  that  we  have 
the  additional  known  quantity  u  appearing  in  the  first  equation.  In  our 
model  this  is  accounted  for  by  the  explicit  dependence  of  f(x(t),t)  in 
(1.5)  on  the  time  parameter  t. 

There  is  no  loss  of  generality  in  assuming  w(t)  to  be  white  noise. 
It  has  been  emphasized  in  the  early  work  of  Bode  and  Shannon  [8],  and 
Zadeh  and  Ragazzini  [67]   that  a  Gaussian  process  with  a  rational  spectrum 
may  be  considered  equivalent  to  the  output  of  a  linear  system  excited  by 

white  Gaussian  noise.  If  we  denote  the  state  variables-  of  this  equivalent 

(2) 
linear  system  by  x  '  and  the  state  variables  of  the  process  to  be  con- 

(1 ) 

trolled  by  x  '  9   we  may  form  a  composite  state  vector 


xo) 


x  = 


x(2) 


and  obtain  equations  of  the  form  of  (1 .5)  and  (1.6). 

A  simple  example  may  help  to  clarify  this  point.  Consider  a 
scalar  process  of  the  form 

x(t)  =  f(x(t),u(t))  +r(t) 

z(t)  =  h(x(t),t)  +  v(t) 
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where  r  is  a  Gaussian  random  process  with  the  spectrum 


N2 


$   (co)  = 


Then  we  may  consider  r  as  the  output  of  the  linear  system 


r  =  -  a  r  +  w 


where  w  is  a  white  Gaussian  noise  process  with  the  spectrum 


If  we  designate  x  by  x^  '  and  r  by  x*  ',  we  obtain  the  following  system 
of  equations; 


_d 
dt 


x0)(t) 
x<2>(t) 


f(x(1)(t),u(t))  +x(2)(t) 


ax(2>(t) 


w(t) 


z(t)  =h(x(1)(t),t)  +  v(t) 

which  are  of  the  form  of  (1.5)  and  (1<,6)„ 

1 o^3   Relation  to  Adaptive  Control 

Although  there  is  no  universally  accepted  definition  of  adaptive 
control,  most  investigators  would  agree  that  a  central  problem  in  the 
theory  of  adaptive  control  is  the  identification  problem  [46  J,  Changing 
environmental  conditions  are  frequently  incorporated  in  the  mathematical 
model  of  the  system  as  changes  of  certain  parameters  in  that  model e  The 
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identification  problem  is  concerned  with  automatically  evaluating  current 
values  of  the  parameters.  Because  these  parameters  usually  enter  the 
model  multiplicatively,  the  problem  of  estimating  them  is  often  equivalent 
to  that  of  estimating  state  variables  of  a  nonlinear  dynamic  system,, 

Estimation  of  parameters  will  be  treated  as  a  special  case  of  es- 
timation of  state  variables.  The  basic  technique  is  to  augment  the 
state- space  of  the  system  to  include  the  parameters  themselves  and  addi- 
tional quantities  to  account  for  the  nature  of  the  statistical  variations 
of  the  parameters ,  For  example,,  an  unknown  constant  parameter  b  would 
be  handled  by  letting  x^-j  equal  b  and  adjoining  the  equation  xn+1  (t)  =  0 
to  the  basic  system  of  equations  describing  the  process «,  A  randomly  varying 
parameter  is  treated  as  the  output  of  a  linear  system  excited  by  white 
noise o  In  this  case  it  is  necessary  to  adjoin  the  states  of  this  fictitious 
linear  system  to  the  states  of  the  basic  process. 

Let  us  consider  a  simple  example  to  illustrate  these  ideas,  A 
problem  of  considerable  importance  is  the  control  of  systems  with  lightly 
damped  mechanical  resonances  L^6j .  Frequently  there  is  a  problem  of  tuning 
the  compensating  network  as  the  resonant  frequency  of  the  system  drifts  due 
to  changing  environmental  conditions „  such  as  are  commonly  experienced  by 
high  performance  aircraft.  For  a  second  order  system  a  suitable  model 
for  this  situation  might  be 

•  •  o 

y  +  b  y  +  (c_  +  c)  y  =  u  +  w. 


0  T  W  jr  -  m.  t  w.| 


c  +  a  c  =  Wo 


z  =  y  +  v 


where  the  random  parameter  c  is  considered  to  be  the  output  of  a  first 
order  system  excited  by  white  noise.  Letting 
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x,  =y 


X,  =  y 


*3 


=  c 


we  may  rewrite  these  equations  in  the  form  of  (1.5)  and  (1.6)  as  follows; 


_d 
dt 


x. 


LX3 


v2 
-b  2 


2 

a  x. 


(cQ  + 


x3)  ^ 


+  u 


0 

o" 

"»!    ' 

+ 

1 

0 

0 

1 

-W2    . 

Z  =  X.  +  V 

1 

Again  the  explicit  dependence  of  f(x(t),t)  in  (1.5)  on  t  accounts 
for  any  known  inputs  such  as  control  forces  or  test  signals. 

1 .44   Relation  to  Linear  Filtering  Theory 

The  application  of  statistical  methods  to  automatic  control 
problems  began  with  the  classic  work  of  Wiener  [65]  on  linear  filtering 
and  prediction  theory.  The  early  exposition  by  Bode  and  Shannon  [8] 
presented  the  principal  results  of  the  Wiener  theory  in  a  form  more 
easily  understood  by  engineers.  At  about  the  same  time,  an  extension  of 
the  Wiener  theory  which  was  particularly  significant  from  a  control  point 
of  view  was  made  by  Zadeh  and  Ragazzini  \jo7\o     Both  of  these  early  papers 
[8,  67]  emphasized  the  viewpoint  that  stationary  random  processes  with 
rational  spectra  could  be  thought  of  as  the  output  of  linear  systems 
excited  by  white  noise.  Several  investigators  [ 1,13*27]  have  extended 
these  ideas  to  multidimensional  systems  0  The  work  of  Davis  [13]  gives 
a  general  solution  to  the  problem  of  factoring  spectral  matrices  and 
relates  the  linear  stationary  theory  to  the  state-space  approach.  Today 
material  on  the  one-dimensional  Wiener  theory  and  its  extensions  is  avail- 
able in  a  large  number  of  standard  textbooks;  for  example,  [11 ,12,39»42„ 
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*9.  57,  60,  63] . 

The  problem  we  consider  is  a  nonlinear  version  of  the  growing 
memory  filters  discussed  by  Kalman  and  Bucy  [3^]  who  studied  random 
processes  described  by  the  linear  vector  differential  equations 

x(t)  =  F(t)x(t)  +  G(t)w(t)  (1.7) 

z (t)  =  H(t)x(t)  +  v(t)  (1 D8) 

which  are  the  linear  form  of  (1.5)  and  (1,6),  They  give  the  general 
solution  to  the  problem  of  linearly  estimating  and  predicting  x(t) 
based  on  a  finite  observation  interval.  If  F,  G,  and  H  are  constant 
matrices  and  the  observation  interval  extends  to  negative  infinity,  this 
is  a  multivariate  Wiener  filtering  problem  in  which  the  random  process  is 
specified  by  the  linear  system  (1.7)  and  (1.8)  rather  than  by  its  spectral 
matrix.  For  this  stationary  case,  (1.7)  and  (1.8)  may  be  obtained  by 
factorization  of  the  spectral  matrix  [13]«  It  is  indeed  unfortunate  that 
the  important  results  of  Kalman  and  Bucy  are  not  well  understood  by  many 
control  specialists.  This  is,  no  doubt,  due  in  part  to  the  elegant  but 
intricate  nature  of  their  original  derivation.  An  interesting  by-product 
of  our  study  of  the  nonlinear  problem  has  been  a  simplified  derivation 
of  their  results  for  the  corresponding  linear  problem,  (see  Chapter  III.) 

The  idea  of  considering  a  random  process  to  be  the  result  of  exciting 
a  linear  system  with  white  noise  has  great  intuitive  appeal.  This  is  es- 
pecially true  for  a  stationary  process  where  a  suitable  linear  system  may 
be  postulated  as  the  result  of  direct  measurement  of  the  spectrum  of  the 
process.  For  the  time-varying  system  of  Kalman  and  Bucy  it  is  not  at  all 
clear  how  to  make  measurements  on  a  non- stationary  process  to  find  a  suit- 
able system.  However,  their  work  has  direct  application  to  control  problems 
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in  which  prior  information  about  a  time-varying  system  is  available,, 

There  is  considerable  appeal  to  the  idea  of  considering  a 
stationary  non-Gaussian  random  process  to  be  the  result  of  exciting 
with  white  noise  a  nonlinear  time-invariant  system  having  a  finite  number 
of  state  variables „  Again  it  is  not  at  all  clear  how  to  make  measurements 
on  the  process  to  determine  the  appropriate  system.  Our  work  is  applicable 
to  cases  in  which  prior  information  about  the  nonlinear  system  is  available, 

1  A5      Generality  of  the  Model 

It  has  already  been  pointed  out  that  any  known  inputs  such  as 
control  forces  or  test  signals  are  taken  into  account  by  the  explicit 
dependence  of  f(x(t)9t)  on  t.  We  also  showed  how  estimation  of  parameters 
could  be  treated  as  a  special  case  of  estimation  of  state  variables. 

There  is  no  loss  of  generality  in  assuming  £(t)  is  positive 
definite  because  of  the  presence  of  G(t)  in  (1.5)o  The  restriction  that 
R(t)  be  positive  definite  is  more  basic.  To  relax  this  restriction  for 
the  continuous-time  problem  is  to  admit  the  possibility  of  differentia- 
ting the  observed  signal  [29] o 

A  more  general  version  of  (1 .5)  would  be 

x(t)  =  f(x(t),w(t),t)  (1.9) 

The  basic  objection  to  (1.9)  is  that  extreme  care  must  be  taken  when  per- 
forming nonlinear  operations  on  white  noise  because  white  noise  is  defined 
by  a  limiting  process  and  the  appropriate  limit  may  fail  to  exist.  For 
example „  it  is  meaningless  to  square  white  noise.  This  objection  may  be 
avoided  by  considering  the  equation 

x(t)  =  f(x(t),t)  +G(x(t),t)w(t)  (1.10) 
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For  simplicity,  we  shall,  in  the  main,  confine  our  attention  to  systems 
in  which  the  white  noise  enters  linearly  as  in  (1.5)»  pointing  out  where 
extensions  can  be  made  to  include  systems  of  the  type  (1.10)  and  well  de- 
fined versions  of  (1.9)<> 


Before  becoming  involved  in  a  maze  of  equations  it  is  perhaps 
worthwhile  to  see  in  which  direction  an  intuitive  approach  might  lead. 

Consider  the  problem  of  estimating  the  state  variables  of  a  non- 
linear system  for  which  all  inputs  are  known  (no  random  disturbances) 
and  the  observations  at  the  output  are  noisy.  A  simple  approach  to  this 
problem  is  to  feed  the  known  inputs  to  a  model  of  the  nonlinear  system 
which  hopefully,  after  some  initial  transients  have  subsided,  will  behave 
in  the  same  manner  as  the  system  itself.  The  model  would  be  constructed 
so  that  its  state  variables  were  easily  measurable 0  Such  a  scheme  is 
shown  in  Fig.  3» 


known  inputs 


^.measurements 


Fig.  3o   Open  loop  model  approach 

We  shall  say  that  the  scheme  of  Fig.  3  uses  the  model  in  an  open  loop 
manner.  While  in  theory  this  approach  should  yield  good  results,  the 
practical  considerations  of  model  inaccuracies  and  random  disturbances 
would  render  it  useless  in  all  but  the  simplest  of  cases. 
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The  obvious  way  to  combat  model  inaccuracies  and  random  dis. 

turbances  is  by  the  use  of  feedback «  This  is  done  in  Fig.  4, 
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Fig,  40   Closed  loop  model  approach 

The  closed  loop  configuration  of  Fig„  4  has  great  intuitive  appeal 0 
While  it  is  not  clear  how  the  error  should  be  processed  in  order  to 
properly  adjust  the  model,  we  would  expect  that  something  could  be 
worked  oute  It  will  be  shown  later  that  the  model  approach  may  be 
derived  as  the  result  of  considering  the  basic  equations  (1 „5)  and  (1 „6) 
and  that  there  is  a  rational  way  of  processing  the  data0 

It  is  amazing  how  many  seemingly  diversified  topics  fall  into 
the  class  of  closed  loop  model  schemes  0  Kalman  and  Bucy  [3^]  show  that 
the  solution  of  the  linear  filtering  problem  may  be  interpretated  in  this 
manner o  Davis  [13]  gives  a  model  interpretation  when  the  measurement 
noise  is  not  white 0  The  very  general  Wiener  theory  of  nonlinear  systems 
[46,  66  J  assumes  that  nothing  is  known  about  the  system  and,  using  white 
Gaussian  noise  as  the  known  test  input,  develops  a  model  in  terms  of  an 
orthogonal  series „ 


.21- 


A  variant  on  this  theme  is  the  optimization  of  the  system  by  ad- 
justing  it  to  behave  more  like  the  model.  The  recent  work  of  Sakrison  [56 ] 
and  Kushner  [38]  in  this  area  may  be  cited  as  examples,  A  large  number 
of  optimization  and/or  learning  schemes  using  models  are  to  be  found  in 
the  adaptive  control  literature  [43„  46 ]„ 

The  various  schemes  using  models  differ  in  the  amount  of  prior 
information  assumed  and  the  way  in  which  observations  are  processed  to 
make  improvements  on  the  model  or  system c  In  our  work  it  is  assumed  that 
the  system  is  represented  by  known  equations  of  the  type  (1.5)  and  (1.6). 
In  the  preceding  sections  it  was  shown  that  many  problems  fall  into 
this  category o  There  are  many  situations,,  however ,  in  which  adequate 
knowledge  of  the  physics  of  the  system  is  not  available  and  one  is  unable 
to  write  equations  of  the  form  of  (1 c5)  and  (1.6)  even  using  the  technique 
of  treating  unknown  parameters  as  additional  state  variables.  Our  work 
is  not  directly  applicable  to  these  situations. 
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CHAPTER  II 
ESTIMATION  OF  STATE  VARIABLES  FOR  DISCRETE-TIME  SYSTEMS 
201   Introduction 

This  chapter  is  devoted  to  the  problem  of  estimating  state 
variables  for  dis ere te- time  systems 0     This  problem  is  completely  anal- 
ogous to  the  continuous=time  problem  discussed  in  the  preceding  chapterc 
The  discussion  of  the  usefulness  and  limitations  of  the  continuous-time 
model  presented  there  applies  also  to  the  discrete- time  model  to  be 
introduced  forthwith,, 

Discrete-time  systems  are  usually  approximations  of  continuous-time 
systems o  They  assume  great  practical  importance  because  digital  computers 
work  in  discrete-time  and  complicated  problems  usually  must  be  solved  on 
digital  computers o  The  extent  to  which  the  ability  to  use  digital  com- 
puters to  obtain  numerical  solutions  to  complex  problems  has  influenced 
modern  technology  is  evidenced  in  the  fact  that  today  an  effective  com- 
putational algorithm  is  often  considered  to  be  a  solution  to  a  problem 
which,  because  of  the  limitations  of  purely  analytical  techniques,  would 
not  even  have  been  investigated  a  decade  ago„  The  possibility  of  using 
special  purpose  digital  computers  as  control  system  components  is  to  a 
large  extent  responsible  for  present  trends  in  control  theory.  We  shall 
continually  be  mindful  of  computational  difficulties  as  the  investigation 
of  the  estimation  problem  proceeds 0 

The  equations  which  characterize  the  remainder  of  this  work  seem 
quite  formidable  at  first  glance «,  The  reader  should  take  heart  in  that 
the  underlying  ideas  are  simple  and  most  derivations  involve  nothing 
more  complicated  than  differentiation c 
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2.2   Problem  Statement 

The  problem  to  be  considered  is  the  estimation  of  the  state 
variables  of  a  discrete-time  nonlinear  system  which  is  excited  by  a 
sequence  of  independent  Gaussian  random  vectors.  It  is  assumed  that 
nonlinear  combinations  of  the  state  variables  corrupted  by  additive  in- 
dependent Gaussian  noise  may  be  observed,, 

In  particular,  we  consider  systems  which  are  described  by  vector 
difference  equations  of  the  following  form; 

x(k+1)  =  f(x(k),k)  +  G(k)w(k)  (2.1) 

The  observed  signal  is 

z(k)  =  h(x(k),k)  +  v(k)  (2.2) 

where 

x  is  an  n° vector,  the  state  vector  of  the  system 

w  is  an  m= vector,  m  -  n,  the  random  input 

z  is  a  p= vector,  the  observation 

G  is  a  n  x  m  matrix 

v  is  a  p~ vector,  the  measurement  noise 

h  and  f  are  vector-valued  functions 

.oo,w(k),w(k+1 ),oo«  and  ee„,  v(k),v(k+1 ),... 
are  sequences  of  Gaussian  random  vectors  with  zero  means  and  the  following 
covariance  matrices • 


E[w(j)w'(k)]   =£(k)  6jk 
E[v(j)y»(kJ]  =R< 

E[w(j)v»(k)]   =  0 


Efl(j)  I'(k)]  =  R(k)  £„  \    for  all  intergers  j  and  k 
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£  and  R  are  positive  definite  matrices ,  E  is  the  expectation  operator 
and  &  is  the  Kronecker  delta.  Again  there  is  no  loss  of  generality 
by  assuming  that  £  is  positive  definite.  The  restriction  that  R  be 
positive  definite  will  later  be  relaxed , 

Having  observed  a  finite  sequence f  £(o ),,,,, ,z (n) j ,  we  may,  in 
general,  seek  an  estimate  of  an  entire  sequence  of  states  |  x(o),,,Mx(n), 
,  c  ,x(n-hn)J  „  We  shall  call  this  the  estimation  problem.  In  this  formula- 
tion we  include  as  special  cases  the  filtering  problem,  where  an  estimate 
of  the  current  state  x(n)  is  soughtf  the  smoothing  problem,  where  an  es- 
timate of  the  sequence{  x(o) » • « ° 9*(n)]  ^s  °^  interest;  and  the  prediction 
problem,  where  an  estimate  of  a  future  state  x(n-Hn)  is  desired. 

As  in  the  continuQUS~time  problem,  we  may  easily  include  as 

state  variables  unknown  or  randomly  varying  parameters.  Any  known 

forcing  functions,  such  as  control  inputs  or  test  signals  are  accounted 

for  by  the  explicit  dependence  of  f(x(k),k)  on  the  time  parameter  k. 

Such  known  parameters  will  usually  simplify  the  estimation  problem  since 

they  provide  additional  information  about  the  system  behavior. 

The  model  for  the  discrete-time  system  may  be  represented  by  the 

block  diagram  of  Jig,  5* 

measurement  noise  v(k) 


random 
input 

w(k) 

G(k) 

+ 

s 

x 

s 

z(k) 

'  observation 

> 


Fig<>  5o   Block  diagram  of  the  discrete- time  system 
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Kalman  [31]  first  solved  the  linear  filtering  and  prediction 
problems  for  the  linear  version  of  (201)  and  (2.2)  given  by 

x(k+1 )  =  F(k)x(k)  +  G(k)w(k) 

z(k)  =  H(k)x(k)  +  v(k) 

In  recent  independent  work  Ho  [25]  has  used  an  approach  similar 
to  the  one  we  shall  use  to  re-solve  this  linear  problem. 

The  systems  described  by  (201)  are  a  subset  of  the  systems  defined 
by  the  more  general  equation 

x(k+l)  =  f(x(k),w(k),k)  (2,3) 

Extensions  to  include  these  more  general  systems  will  be  pointed  out,  but 
it  is  more  convenient  to  work  with  (2„1)  as  the  basic  equation. 

Finally,  we  should  note  that  the  time  parameter  k  designates 
the  time  of  the  kth  observation.  There  is  no  need  to  assume  that  obser- 
vations are  equally  spaced 0  There  is  no  loss  of  generality  in  assuming 
that  observations  begin  at  \^=o9 

2,3   The  Probability  Distribution  for  the  Sequence  of  States 

Let  us  first  examine  the  problem  of  estimating  the  sequence  of 
states  j  x(o)„ 000  9x(n)  J »  having  observed  the  sequence  f  z(o),„,.  sz.(n)}  „ 
Proceeding  in  a  straightforward  manner,  we  consider  the  a  posteriori 
probability  density  function 

p     (x(o),.„0,x(n)|z(o),...,z(n)) 

xNizN 

which  is  the  conditional  joint  probability  density  function  for  the 
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sequence  of  states  [  x(o),...  ,x(n)j  on  the  assumption  that  the  observation 
sequence  [  £(o)9<,«,  „ z(n)]  has  occurred,,  This  function  will  be,  in  general, 
extremely  complex,.  Using  Bayes*  Theorem,  we  obtain 


p„  ,   (x(o),.o.,x(n)|z(o),,o,,z(n)) 

xNl  zN  - 


Pz«ix. (z(o),„ooaz(n)|x(o)Dooo,x(n))  p  (x(o),.eo ,x(n)) 

Pz  (z(o)ff„eo„z(n))  (2» 


Let  us  consider  the  terms  of  interest  which  appear  in  the 
numerator  of  this  expression  Using  (2„2)  and  the  independence  of  the 
measurement  noise,  we  may  rewrite  the  first  term  as 

n 

Pz  ix  (z(o),o.«,z(n)|x(o),ooo5x(n))  =  TT  Pv  (2(k)-h(x(k),k)) 
W   N  k='0  vk 


where  pr  is  the  Gaussian  density  function  for  the  additive  measure. 
vk 

ment  noise  v(k)„ 

The  second  term  may  be  rewritten  in  the  following  form; 

Px  (x(o),„„o9x(n))  =  pQ(x(o))  °  Pj(x(1)|x(o»  a°° 

pn(x(n) | x(n-1 ) , „ o  o 9x(o) ) 

The  independence  of  successive  values  of  w  makes  (201)  a  Markov  pro- 
cess [z6]0     That  is, 

pk(x(k)|x(k-1),000,x(o))  =  pk(x(k)|x(k»1)) 
Substituting  into  (20^)»  we  obtain 
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Px  i  z„te(°) » •  •  •  »£(n)|  2.(o) , . . .  ,z(n)) 

AN'    N 


T  Pv  (z(k)-h(x(k),k))  p0(x(o))  7T  PkCxOOlxCfc-O) 

k=o   k        k=1 

Pz„(z(o),...,z(n))  (2.5) 

N 


We  assign  an  a  priori  Gaussian  distribution  with  mean  m  and  co- 
variance  matrix  P(o)  for  the  unknown  initial  state  x(o).  I*  will  be 
convenient,  although  not  essential,  to  assume  P(o)  is  non-singular 0  This 
restriction  will  later  be  relaxed. 

Before  proceeding  further,  it  is  necessary  to  examine  the  nature 

of  the  input  sequence,,  Let  r(k)  =  G(k)w(k).  Then  .  „  •  ,r (k)  ,r (k+1  ) ,  . •  • 

is  a  sequence  of  independent  Gaussian  random  vectors  with  zero  means. 

The  covariance  matrix  for  r(k)  is  G(k)£(k)G' (k).  If  we  designate  the 

probability  density  function  for  r(k)  by  p   and  then  use  the  fact  that 

rk 

x(k)  =  f(x(k~D,k-1)  +  r(k-1) 

we  obtain  formally 

PkfeOO  |x(k-1 ))  =  p         (x(k)-f (x(k-1  ),k-1 )) 
K  rk-1 

If  GQG1  is  singular,  then  r  is  confined  with  probability  one  to  a 

hyperplane  in  n-space  of  dimension  equal  to  the  rank  of  GQG' ,  and 

without  resorting  to  delta  functions  we  are  unable  to  write  an  explicit 

expression  for  the  probability  density  function  p_  .  We  note,  however, 

rk 

formally 

Pr,  (£(k))  =  /p    (r(k)|w(k))  p  (w(k))  dw(k)  (2.6) 

k  Akl  k  wk 
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where  the  conditional  density  function  p     may  be  interpreted  as  a 

rk|wk 

constraint,  since  r(k)  is  known  once  w(k)  is  specified. 

If,  on  the  other  hand,  the  covariance  matrix  GQG1  is  non-singular, 

an  explicit  expression  may  be  given  for  the  Gaussian  probability  density 

function  p  .  In  this  case  all  the  terms  in  the  numerator  of  (2.5)  may 

rk 
be  expressed  explicitly  since  p  ,  p  ,  and  p  are  all  Gaussian  with  known 

fw 

means  and  covariance  matrices.  Then,  (2.5)  becomes 


PY   i*   (x(o),...,x(n)|z(o),...z(n))  =  C(z(o),„..,z(n)) 

XN,ZN 


■4     £  IIzOO-h(x(k),k)   ||     1         -  *  ||  x(o)-«  ||     < 

k=0  R~   (k)  P     (o) 

n-1  2 

-  *     Y:     ||x(k+1)-f(x(k),k)   ||  .1 

k=0  [GgG'J      J  (2.7) 

where  C(z>(o)SOOo  »z,(n))  is  a  normalizing  factor  depending  only  on  the 
observed  output  sequence  and  known  constants.  The  norm  notation  has 
been  used  to  emphasize  the  positiveness  of  the  quadratic  forms  appearing 
in  this  expression,, 

2ek      The  Estimation  Procedure 

Now  that  an  expression  for  the  a  posteriori  probability  density 
function  has  been  obtained  for  the  case  in  which  GQG1  is  non-singular, 
the  nature  of  the  difficulties  introduced  by  nonlinearities  is  apparent. 
One  might  choose  an  estimate  x  so  as  to  minimize  the  conditional  expecta- 
tion of  some  appropriate  loss  function.  That  is 


Min  |   E  [  L(x-x)]  I 
X   (.   x|z  J 
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For  a  quadratic  loss  function  defined  by  a  positive  definite  matrix 
W,  it  is  well  known  [29]  that  the  best  estimate  is  the  conditional  mean. 
To  see  this  let 


2 
L,  (x-x)  =  ||x-x  ||    =  x'Wx  -  2x«Wx  +  x'Wx 
W 


Then 


E  [^(x-x)]  =  x'Wx  -  2x'Wx  +  x'Wx 
x|z 

Differentiating  with  respect  to  x  and  setting  the  result  equal  to  zero 
we  find  that  the  best  estimate  is  x,  the  conditional  mean. 

The  task  of  finding  the  mean  of  (2.7)  is  overwhelming  because  of 
the  nonlinearity  of  the  system. 

It  is  also  well  known  [j&l  that  for  a  linear  loss  function  of  the 
form 

L2(x-x)  =  2  ^  Ixj-xj  c±>  0 

the  ith  coordinate  of  the  best  estimate  is  the  median  of  the  marginal 
distribution  for  x. .  To  see  this  note  that 

E  [Lp  (»-£)]  =  zL  c.  |x.-x.| 

xiz  c  i   x  *    1 


and 

l~~J   =   /  i  (x^)  Pi(x.)  dx.  +  J      (x^)  Pi(Xi)  dxi 

-OO  Xj 

Taking  the  derivative  of  this  expression  with  respect  to  x.  and  setting 
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the  result  equal  to  zero,  we  obtain  the  desired  result, 

A 

xi  f°° 

f       pi(xi)  dxi  =  a  J        Pi(Xi)  dx± 

—  oo  J  X. 

Again  the  nonlinearity  of  the  system  is  our  nemisis  if  we  try  to  apply 
this  result  to  (2.7)* 

An  appealing  loss  function  for  many  applications  is  one  that 
weights  equally  all  errors  greater  than  some  threshold  value.  This 
would  express  the  idea  that  once  the  size  of  the  error  exceeded  a  spec- 
ified level,  an  increase  in  the  size  of  the  error  would  not  be  signifi- 
cant. An  example  of  such  a  loss  function  is 


L^x-x)  = 


x-x 


W 


,  for  II  x-x 


,  for  I  x-x 


W 

2 

I 
W 


sf  c 


>  c 


For  this  type  of  loss  function  we  would  expect  that  the  mode 
would  be  a  better  estimate  than  either  the  mean  or  the  median.  To  gain 
some  insight  into  this  situation,  consider  the  loss  function  and  the  proba- 
bility  density  function  for  a  scalar  random  variable  0  shown  in  Fig.  6. 


*  p(©) 


♦  L(e-e) 


->  e 


Fig.  6.   A  probability  density  function  and  a  loss  function 


It  is  clear  that  for  the  situation  of  Fig.  6  the  expected  loss 


^0-0 


01 


(area  of  the  product)  will  be  minimized  by  locating  the  valley  of 

L(9-6)  to  correspond  to  the  peak  of  p(6).  That  is,  the  mode  would  be 

the  suitable  estimate „ 

It  is  known  (see  [29]  or  [58])  that  if  the  distribution  function 

Pv,_(x)  is  unimodal  and  symmetric  about  its  mean,  then  the  conditional 
x  i  z 

mean  is  the  best  estimate  for  a  large  class  of  symmetric  loss  functions 
including  L1 ,  L£  and  Lo  above.  We  hasten  to  point  out  that  in  this  case 
the  mean  and  the  mode  coincide  and  one  might  just  as  well  say  that  the 
mode  were  the  best  estimate. 

The  estimate  we  shall  use  is  the  mode  of  the  a  posteriori  dis- 
tribution. This  is  closely  related  to  the  method  of  maximum  likelihood; 
the  difference  being  that  we  take  the  Bayesian  point  of  view  of  assigning 
an  a  priori  distribution  to  the  initial  state. 

For  GQG!  non-singular,  the  estimate  will  be  obtained  by  minimizing 

2J  =||x(o)-m  ||       +  Y,    l|z(k)-h(x(k),k)  ||   1 

P"  (°)    k=o  &     00 

n-1  2 

+  Z  llx(k+1)-£(x<k)fk)  II      =1 

k=o  [G^!]  (2.8) 

with  respect  to  the  sequence  { x(o),,,,,x(n)}  ,  This  is  equivalent  to 
minimizing  with  respect  to  the  sequences  j x(o),.,,,x(n)J  and  (w(o),,,,, 
w(n-1 )j ,  the  expression 

||x(o)-m  ||      +  Y   »£(k)-h(x(k)9k)  ||      +  £  ||w(k)  || 

P"  (o)    k=o  R  00    k=o      £"1(k) 

subject  to  the  constraint 

x(k+1)  =  f(x(k),k)  +  G(k)w(k) 
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Introducing  an  n-vector  of  Lagrange  multipliers  A  to  incorporate  the 
constraint,  we  may  minimize 

I„  =  *|l£(o)-S  II     «         +    E  illz(k)-h(x(k),k)  II 

n  P     (o)         k=o  R     00 

+    £  {  ill  H(k)  II     1         +  A'  (k)  [  x(k+1  )-f  (x(k),k)-G(k)w(k)]}     (2.9) 
k=o  2"   (k) 

From  (2.6)  we  see  that  the  function  I  may  also  be  minimized  when  GQG' 
is  singular. 

The  more  general  problem  of  estimating  the  state  sequence 
( x(o),...,x(n),..„1,x(n-»m)  }  after  the  sequence  ( z(o ),..., z_(n)j  has  been 
observed  may  be  reduced  by  an  analogous  development  to  that  of  minimizing 


Jn  m  =  *  l£(o)-S  Hi    +  E  *  ||z(k)»h(x(k),k)  || 
n'm  V  (o)    ^°  tr 


(k) 


n4m~1  r  2 

+      T,      i  IllOO  II      1         +  A' 00  [x(k+1)-f(x(k)9k)-G(k)w(k)] 
k=o  2    00 

(2.10) 

A  comparison  of  (2.9)  and  (2.10)  yields 

n+m=1  r  2 

*n  m  =  Jn  +       E      i  llsCk)  II     i         +  A' 00  [  x(k+1  )-f  (x(k),k)-G(k)w(k)] 
n,ra        n  k=n   l  ^(k) 

(2.11) 

From  (2.11 )  we  conclude  that  I    may  be  minimized  by  first  mini- 
mizing  In  and  then  setting  w(k)  equal  to  zero  for  k=n,...,n-fm-1 .  This 
leads  us  to  the  important  conclusion  that  predictions  are  simply  extrapola- 
tions of  present  estimates.  Specifically,  if  we  let  x(k|n)  denote  the 
estimate  of  x(k)  given  {  z(o)to«. ,z(n)J ,  then 
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x(n+1  In)  =  f(x(nln),n) 
x(n+2|n)  =  f(x(n+1|n),  n+1 ) 


x(irta|n)  =  f(x(rHm-1|n),  n-to-1 )  (2.12) 

Several  investigators  (see  [13  J  and  [29])  have  pointed  out  that  extra- 
polation of  state  variables  is  an  interpretation  of  what  the  Wiener 
predictor  does  in  the  linear  case0 

For  the  more  general  system  (2.3) »  the  basic  process  is  still 
Markovian.  In  this  case  a  modified  version  of  (2*9)  is  obtained 

I„  <■  * II 5(<>)-S  II  -    +  E  il|z(k)-h(x(k),k)  ||  1 

Pral(o)    k=o  R  (k) 

+    ZfillsOOH     4         +A'(k)  [x(k+1)»f(x(k),w(k),k)j]  (2.13) 

k=o (  gr1  (k)  ) 

For  this  case  (2.12)  becomes 

x(n+1  In)  =  f(x(nln)90,n)  (2.14) 


Having  reduced  the  estimation  problem  to  a  minimization  problem, 
the  possibility  arises  of  proceeding  sequentially  by  a  dynamic  pro- 
gramming formulation  [3„  4,  5,  20 ]  and  at  each  step  obtaining  an  estimate 

-1 
of  the  present  state.  This  can  be  done  is  either  f   exists  or  GQG'  is 

non-singular. 

<-1 

Case  i.   f   exists  for  k=o,„o..n-1 
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If  f~1  exists  we  have  from  (2.1) 


x(k-1 )  =  f~1  (x(k)-G(k-1 )w(k-1  ),k-1 ) 


(2.15) 


In  order  to  establish  a  sequential  procedure,  we  define  the  following 
scalar  "cost"  functions 


2  2 

V0(x(o))  =  Ux(o)-m  II  !    +|z(o)-h(x(o),o)  ||  1 
°  P"  (o)  R-  (o) 


Vn(x(n))  =  Min 

w(o),..„,w(n-1 ) 


x(o)-m  II   1 

H(o) 


n 


2        n-1       2 
+  Z  IU(k)=h(x(k),k)  ||  1    +  V   ||w(k)  ||      >n=1,2, 


(2.16) 


subject  to  the  constraint  (2.1 5)  <.  The  separability  property  of  Vn 
enables  us  to  rewrite  this  relation  in  the  following  form 


Vn(x(n))  =  Min       /Min 


w(n~1 )  |  w(o) , . „ . ,w(n-2) 


||x(o)-m|| 


n-1  2  n-2  2 

^  R~1(k)         k'fe  ^(k) 


k=o 


2  2 

+  ||z(n)-h(x(n),n)  II     -         + 1|  w(n-1 )  ||     1  )>n=1,2,o 

R     (n)  £     (n-1) 


subject  to  the  constraint  (2.1 5) <>  This  is  equivalent  to 


V  (x(n))=  Min 

w(n-1 ) 


Vl^(»-1))+  ||z(n)-h(x(n)0n)  I  1 

R"  (n) 


+  II  w(n-1 )  || 

fl" 1  (n-1 ) 
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again  subject  to  (2.15).  Finally,  using  (2.1 5)  t  we  obtain  the  functional 
equation 

V  (x(n))  =  Min    J  V  1 (f°1 (x(n)-G(n-1 )w(n-1 ),n-1 )) 
w(n-1 ) 

2  2 
+  l|w(n-1)||        +  ||z(n)-h(x(n),n)  || 

3  (»-1)  R  (n)J      (2.17) 

Let  c  be  any  particular  value  of  x(n).  We  may  interpret  Vn(c) 
as  a  measure  of  the  unlikeliness  of  the  most  probable  sequence  of  states 
{&(o)»«.«»£(n)J  in  which  x(n)  "t^kes  on  the  particular  value  c,  given  the 
observed  sequence  [  z(o)5...  ,z,(n) }  and  the  a  priori  distribution  for  x(o). 
The  estimate  of  x(n)  is  that  value  of  x(n)  for  which  V  (x(n))  is  minimumc 
This  corresponds  to  choosing  the  x(n)  °f  the  most  probable  (least  unlikely) 
of  all  possible  sequences  of  states  given  the  observation  and  the  a  priori 
distribution. 

This  procedure  may  be  extended  to  the  more  general  system  (2.3) 
if  x(n-1  )  is  uniquely  specified  by  x(n)  and  w(n-1).  For  such  systems 
there  will  exist  a  relation  of  the  form 

x(n~1)  =  £(x(n),w(n-1)„n~1)  (2.18) 

Using  (2.18)  in  place  of  (2.15)  in  the  preceding  development,  the  follow- 
ing functional  equation  is  obtained; 

Vn(x(n))  =  Min   J  Vn  1  (£(x(n),w(n-1  ),n-1  )) 
w(n=1 )  | 

2  2 

+  |w(n-1)||        +||  z(n)~h(x(n),n)  ||   1 

^(n-O  R  (n)|        (2.19) 
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Case  iio   G(k)£(k)G' (k)  is  non-singular  for  k=of)><>.  ,n„ 

Let  us  first  consider  the  problem  of  predicting  one  step  ahead. 
Then  the  function  to  be  minimized  is 


n    f  2  2 

2J     1   =     2  {  II  z(k)-h(x(k)tk)  ||      1         +  ||  x(k+1)-f(x(k),k)  ||  _., 

n>1  k=o  C  R"   (k)  [QQO'l 


+    II  x(o)-m  ||     , 

P     (o) 


We  define  a  "cost"  function  as  follows; 


W0(x(o))  =  ||  x(o)-m  ||      1 

P"   (o) 


W      (x(n+1))=    Min  <||x(o)~m  ||  +    ^    f||  z(k)-h(x(k),k)  ||      1 

x(o),...fx(n)     ]  P" \o)         k=o    L  R~1(k) 


+    ||x(k+1)-f(x(k),k) 


[G2G»]  J 


n=1,2,...  (2e20) 


Using  the  separability  property  as  before,  this  relation  may  be  re« 
written  in  the  following  form; 


W n+1(x(n+1))  =  Min  J   Min 


x(n)   x(o),ooo,x(n-1) 


p  —  4  p 

||x(o)-m|  1    +  £{||z(k)-h(x(k)fk)||  1 
P  (o)    k=ol  R~\ 


+  ||x(k+1)-f(x(k)„k)  ||      - 

[gqg«] 


+  ||z(n)-h(x(n),n)  ||   1 

R°  (n) 


+  ||x(n+1)-f(x(n),n; 


fssa»j  ~1 
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Using  the  definition  of  W  (x(n)),  we  obtain  the  following  functional 
equation; 

I  2 

Wn+lte(n+1))  =  M*    <W_(x(n))  +  ||  x(n+1)-f(x(n),n)||      ^ 
n+1  -        x(n)  {  n  [GSG'J 

2 
+  ||z(n)»h(x(n)9n)  II   1 

R  (n)J  (2„21) 

We  may  give  W  +1  (x(n+1 ) )  an  interpretation  similar  to  the  one  given 
to  V  (x(n))  in  case  i.  Let  c  be  any  particular  value  of  x(n+1)«,  Then 
W+1  (c)  is  a  measure  of  the  unlikeliness  of  the  most  probable  sequence 
of  states  {  x(o ),„<,„, x(n+1  )  J  in  which  x(n+1 )  takes  on  the  particular  value 
c,  given  the  observed  sequence  {  z_(o ),,„., z^n)]  and  the  a  priori  distribution 
for  x(o)«  The  estimate  of  x(n+1 )  based  on  the  observed  sequence  {  z(o),„. , 
z_(n)j  is  that  value  of  x(n+1  )  for  which  W  +1  (x(n+1  ))  is  minimum „ 

The  estimate  of  x(n)  based  on  {  z(o),.,0,z(n)j  is  the  value  of 
x(n)  for  which  the  following  function  is  minimum; 

2 
Vn(x(n))  =  Wn(x(n))  +  ||  a(n)-h(n),n)  ||  1 

R"  (n)  (2022) 

where  Vn(x(n))  has  the  same  interpretation  as  in  case  i.  Since  the  value 
of  x(n)  for  which  W(x(n))  is  minimum  is  the  predicted  value  of  x(n)  given 
(z/cOj.o.jizCn-l )] ,  we  may  give  (2„22)  the  following  simple  intuitive  inter- 
pretation,, The  new  observation  z(n)   is  weighted  according  to  its  reliability, 
R~  „  and  used  to  modify  the  predicted  value  of  x(n)  to  obtain  the  current 
estimate  of  x(n)»  The  degree  of  uncertainty  of  prior  knowledge  is  reflected 
in  the  sensitivity  of  Vn(x(n))  to  changes  in  x(n)„ 
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If  the  basic  system  were  linear,  either  of  the  two  functional 
equations,  (2.17)  or  (2021),  could  be  solved  analytically  by  assuming 
a  solution  of  the  following  form; 

||  x(n)-x(n)  ||     +  c(n) 
A(n) 

where  A(n)  is  positive  definite  and  is  the  inverse  of  the  covariance 
matrix  of  the  conditional  distribution  for  x(n)  given  the  observations, 
and  where  x(n)  is  the  estimate  and  c(n)  is  a  scalar .  The  recurrence  re- 
lations obtained  by  Kalman  [31]  for  the  estimate  and  the  covariance  matrix 
can  be  obtained  in  this  manner,  but  the  algebra  is  quite  involved.  For 
a  linear  system,  as  we  shall  see  shortly,  an  analytic  solution  may  be 
obtained  by  other  methods  when  the  restrictions  of  cases  i  and  ii  above 
do  not  hold.  In  the  literature  [29,  31 »  32,  25]  it  has  been  assumed  that 
the  transition  matrix  F(k)  is  non-singular  which  corresponds  to  case  i. 

It  is  perhaps  worthwhile  to  review  briefly  how  the  computation 
would  proceed  using  the  functional  equations.  For  greater  detail  the 
reader  is  referred  to  [3,  4,  5]» 

For  (2,17)  and  (2.19)  w  is  said  to  be  the  decision  vector.  This 
is  because  an  appropriate  "decision",  w(n~1),  may  be  associated  with  each 
possible  state  x(n)  in  such  a  manner  that  (2.17)  or  (2.19)  is  satisfied. 
For  (2.21)  the  situation  is  slightly  different  in  that  x  is  both  the 
decision  vector  and  the  state  vector.  From  (2.21 )  we  see  that  with  each 
possible  state  x(n)  we  may  associate  an  appropriate  decision  x(n-1 ).  As 
the  computation  proceeds  at  each  step  the  value  of  the  "cost"  function 
and  the  appropriate  value  of  the  decision  vector  may  be  calculated  for 
each  possible  value  of  the  state  vector.  An  estimate  is  obtained  by 
choosing  the  value  of  the  state  vector  for  which  the  "cost"  function  is 
minimum. 
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If  one  is  interested  only  in  estimating  present  state  variables 
or  predicting  future  state  variables,  there  is  no  need  to  store  old 
values  of  the  decision  vector  for  each  possible  value  of  the  state 
vector.  This  is  different  from  most  dynamic  programming  problems. 

If  it  is  also  of  interest  to  improve  the  original  estimates  of 
past  states  after  the  entire  observation  sequence  has  been  observed,  then 
the  sequence  of  appropriate  decisions  for  each  possible  state  must  be 
stored.  Trading  time  for  memory,  this  may  be  done  on  tape.  The  revised 
estimate  of  x(k~1 )  is  specified  by  the  decision  vector  associated  with 
the  revised  estimate  of  x(k). 

The  importance  of  the  dynamic  programming  formulation  of  the  es- 
timation problem  lies  in  the  possibility  of  processing  each  new  observation 
as  it  occurs.  Herein  there  is  hope  of  solving  the  estimation  problem  in 
real  time.  Modern  computers  have  adequate  fast  memory  to  handle  second 
order  systems  in  this  manner.  For  higher  order  systems  approximation 
techniques,  such  as  polynomial  approximations,  may  be  used  in  conjunction 
with  the  functional  equations.  Computational  aspects  of  dynamic  programming 
are  discussed  in  [  5 ]• 

2»6   Two- Point  Boundary  Value  Problems 

Let  us  now  consider  the  problem  of  minimizing  (2.9)  which  is 
rewritten  here  for  convenience, 

In  =  i  II  2E(o)-m  ||  1    +  JT  f  ||z(k)-h(x(k),k)  || 
n  p"1(o)    feo  V1(k) 

n-1    f  2 

+    SlillHOOH      4         +  A'OO  [x(k+1)~f(x(k),k)-G(k)w(k)] 
k=o  L  SHCk) 

(2.9) 
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Setting  equal  to  zero  the  partial  derivatives  of  In  with  respect 
to  w(k),  X.(k),  (for  k=o,...  ,n-1 ),  and  x(k)  (for  k=o,...,n)  yields  the 
following  equations  which  x(kln)  must  satisfy; 

x(k+lln)  =  f(x(k|n),k)  +  G(k)w(k)          k=o n-1  (2.23) 

w(k)  =  ^(kjG5 (k)A(k)                    k=o,...,n-1  (2.24) 

X(k-1)  =  (  3f«(x(kln),n)/dx)  X(k)  +  (ah'(x(k|n),k)/3x) 

S"1W  [  £<k)  -  h(x(k|n),n)]  k=1,...,n  (2„25) 

X(n)  =  o  (2.26) 

x(o|n)  =m  +  P(o)  Oh«(x(o|n),o)/ax)  R~1  (o)  [  z(o)-h(x(o  |n),o)  ] 

+  P(o)  Of'(x(o|n),o)/3x)  A(o)  (2.27) 

Combining  (2.23)  and  (2.24)  to  eliminate  w,  we  obtain  necessary  conditions 
in  the  form  of  the  following  two- point  boundary  value  problem; 

x(k+1  In)  =  f(x(kln),k)  +  G(k)£(k)G» (k)  A(k)  k=o,...,n-1       (2.28) 

The  equation  for  A  is  (2.25)  a*id  the  boundary  conditions  are  given  by 
(2.26)  and  (2.27).  The  more  general  problem  of  minimizing  (2.13)  yields 
the  following  similar  two-point  boundary  value  problem; 

x(k+1|n)  =  f(x(k|n),w(k),k)  k=o,.„.,n~1       (2.29) 

w(k)=£(k)  (c>f!(x(kln)Dw(k))/aw)  A(k)     k=o,...,n-1       (2.30) 

The  remaining  equations  are  identical  to  (2.25),  (2.26)  and  (2.27)  except 
that  (3f(x(kln),k)/dx)  is  replaced  by  (  df  (x(k|n),w(k),k)/ 3x)  in  (2.25) 
and  (2.27). 
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These  two-point  boundary  value  problems  are  discrete  analogues 
of  the  one  obtained  by  Bryson  and  Frazier  [  1 0 J. 


The  linear  version  of  the  basic  system  is 

x(k+1 )  =  F(k)x(k)  +  G(k)w(k)  (2.31 ) 

z(k)  =  H(k)x(k)  +  v(k)  (2.32) 

In  this  special  case  the  two=point  boundary  value  problem  is 

x(k+1|n)  =  F(k)x(kln)  +  G(k)£(k)G» (k)  A(k)         k=o,...,n-1  (2.33) 
\(k~1)  =  F*(k)  X(k)  +H»(k)r1(k)  [  z(k)-H(k)x(kln)]  k=o,„..,n-1  (2.3*0 

with  the  boundary  conditions 

x(oln)  =m  +  P(o)H«(o)R"1(o)  [  z(o)-H(o)x(oin)  ]  +  P(o)F»(o)  A(o) 

(2.35) 
X(n)  =  o  (2.36) 

2.71   Preliminary  Discussion 

Before  presenting  the  general  solution  to  this  problem,  we 
shall  first  consider  the  simple  cases  n=o  and  n=1  in  order  to  develop 
some  feeling  for  the  nature  of  the  solution. 

For  n=o,  (2.35)  and  (2.36)  may  be  combined  to  give 

x(o|o)  =  m  +  P(o)H«(o)R-1(o)  [  z(o)-H(o)x(olo)  ]  (2.37) 

or 
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[I  +  P(o)H»(o)R~1(o)H(o)]  x(o|o)  =  m  +  P(o)H'(o)R"1(o)z(o) 

The  appropriate  inverse  will  always  exist  for  R(o)  positive  definite 0 
Thus, 

x(olo)  =  [  I  +  P(o)H'(o)R"1(o)H(o)  J"  [m  +  P(o)H'(o)R"1(o)z(o)] 

This  may  be  rewritten  in  the  more  convenient  form 

x(o|o)  =  m  +  [  I  +  P(o)H'(o)R"1  (o)H(o)  ]  "1P(o)H»  (o)R'1  (o)  [  z(o)-H(o)m] 

(2.38) 

■ 
Let 

C(o)  =  [  I  +  P(o)H» (o)R~1  (o)H(o)  ]  "1P(o)  (2.39) 

Then  (2.38)  becomes 

x(o|o)  =  m  +  C(o)H'(o)R~ 1 (o)  [  z(o)-M(o)m  ]  (2.40) 

A  comparison  of  the  boundary  condition  (2.35)  and  (2.37)  shows  that 
(2.35)  Biay  be  reduced  by  a  similar  procedure  to  the  following  form; 

x(o|n)  =x(o|o)  +  C(o)F'(o)A(o)  (2.41) 

The  relation  (2.41 )  is  the  key  to  the  general  solution,  as  we  shall  see 
shortly. 

For  the  case  n=1 ,  (2.33),  (2.3*0  and  (2.41)  become  respectively, 

5(1  M)  =Z(o)x(ol1)  +  G(o)fi(o)G'(o)>(o)  (2,42) 

X(o)  =  H»  (1  )R"1  (1 )  [  z(1 )-  H  x(1  1 1  )  ]  (2.43) 

x(o|1)  =  x(olo)  +  C(o)F'(o)\(o)  (2.44) 
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The  relations  (2.42)  and  (2.44)  may  be  combined  as 

4(1  ID  =  F(o)x(olo)  +[F(o)C(o)F»(o)  +  G(o)£(o)G'(o)]  X(o)  (2.45) 
Let 

P(1 )  =  F(o)C(o)F»  (o)  +  G(g)£(o)GJ  (o)  (2.46) 

Using  (2.46)  and  substituting  for  X(o)  from  (2.43),  we  rewrite  (2.45)  as, 

5(1  ID  =  F(o)x(olo)  +  P(1)H'(1)R"1(1)  ["CD  -I  £(111)]        (2e47) 

Let  us  note  that  F(o)x(olo)  =  x(1lo)  by  (2.12),  and  compare  (2.4?)  with 
(2.37 )<>  We  come  to  the  immediate  conclusion  that  a  procedure  similar  to 
the  one  used  for  n=o  will  yield  an  equation  similar  to  (2.40).  Let 

C(1 )  =  [  I  +  P(1  )H»  (1  )R~1  (1  )H(1 )  ]  ~1  P(1  )  (2.48) 

Then  the  equation  corresponding  to  (2.40 )  is 

x(1 1 1 )  =  x(1  lo)  +  0(1  )H»  (1  )R°1  (1  )  [  z(1  )-H(1  )x(1 1  o)]  (2.49) 

Equations  (2.41),  (2.46),  (2.48)  and  (2.49)  will  now  be  shown  to 
be  similar  in  form  to  the  solution  for  arbitrary  n, 

2.72.   Solution  to  the  Linear  Two~ Point  Boundary  Value  Problem 

Theorem;  Consider  the  linear  two-point  boundary  value  problem  given 
by  (2.33),  (2.34),  (2.35)  and  (2.36)  where  P(o)  and  R(k)  are  positive 
definite  for  k=o,...,n.  Then  x(kln)  satisfies  the  relation 

x(kln)  =x(klk)  +  C(k)F»  (k)A(k)  k=o,...,n       (2.50) 

where  x(olo)  is  given  by  (2.40)  and 
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C(k)  =  [  I  +  P(k)H« (k)R-1  (k)H(k) ]  "1  P(k)  (2.51 ) 

P(k+1 )  =  F(k)C(k)F» (k)  +  G(k)£(k)G» (k)  (2.52) 

x(klk)  =  F(k-1)x(k-llk~1)  +  C(k)H«  (k)R~1  (k) 

[z(k)-H(k)F(k-l)x(k~1lk-1)]  k=1,...,n      (2.53) 

To  establish  a  proof  we  proceed  inductively  showing  that  if  (2.50) 
is  satisfied  by  x(kln),  then  it  is  also  satisfied  for  x(k+1ln). 
Proof;  Suppose  that  x(k|n)  satisfies  (2.50)  for  some  k  €  (o,... ,n-1 ). 
Then,  by  (2.33). 

x(k+1ln)  =F(k)x(klk)  +  [  F(k)C(k)F»(k)  +  G(k)£(k)G» (k)]  A(k) 

Substituting  P(k+1 )  from  (2.52)  and  using   (2.34),  we  obtain 

x(k+1 1  n)  =  F(k)x(kl  k)  +  P(k+1  )H»  (k+1  )R~1  (k+1 )    [  z(k+1  )-H(k+1  )x(k+1  |n)] 

+  P(k+1  )F'(k+1  )Ji(k+1  ) 


or 


[  I  +  £(k+1  )H»  (k+1  )R°1  (k+1  )H(k+1 )  ]    x(k+1|  n)  =  F(k)x(kl  k) 

+  P(k+1  )H'  (k+1  )R°1  (k+1  )z  (k+1 )  +  P(k+1  )F»  (k+1  )^(k+1 ) 

The  needed  inverse  will  always  exist.     After  some  manipulation,,  this  yields 
the  following  equation? 

x(k+1 1  n)  =  F(k)x(kl  k)  +  C(k+1  )H»  (k+1  )R~1  (k+1 )    [  z  (k+1 ) 

-  H(k+1  )F(k)x(k| k)  ]     +    C(k+1  )F'  (k+1 )  A  (k+1  ) 

Using  (2.53),  we  obtain  the  desired  result; 
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x(k+1 1  n)  =  x(k+1  I  k+1 )  +  C(k+1  )F«  (k+1 ) A (k+1 ) 

To  complete  the  proof  we  note  that,  by  (2.^1),  the  hypothesis  is  satisfied 
for  x(o|n). 

Discussion;  Note  that  no  assumption  is  made  that  F(k)  is  non-singular. 
The  recurrence  relations  (2.51 )»  (2<>52)  and  (2,53)  were  first  obtained  by 
Kalman  [31J  in  a  somewhat  different  form  and  using  a  very  different  approach* 
Using  the  following  matrix  identities  derived  in  Appendix  A; 

[1 +  .snr1£]°1  n'lf1  =  ph»  [  hph'  +r]~1 

[I  +  fS'IT1!]"1  L  =   £  -  IS*  Fhph'  +  r"1"1  hp 

we  may  combine  (2.51 )  and  (2.52)  and  rewrite  (2.53)  to  obtain  equations 
more  closely  resembling  Kalman  "s; 

x(k|k)  =  x(klk-l)  +  P(k)H'(k)  [H(k)P(k)H'(k)  +  R(k)l  ~1 

[z(k)-H(k)x(k|k-1)]  (2.5*0 

P(k+1 )  =  F(k)  {  P(k)-P(k)H> (k)  [H(k)P(k)H» (k)  +  R(k)  ]  "1  H(k)P(k)  }  F' (k) 
+  G(k)£(k)G » (k)  (2.55) 

For  the  linear  system,  (2031 )  and  (2.32),  all  probability  distribu- 
tions remain  Gaussian.  P(k)  is  the  covariance  matrix  of  the  distribution 
for  x(k)  given  {  z(o),000  ,z_(k-1  )]  and  C(k)  is  the  covariance  matrix  of  the 
distribution  for  x(k)  given  (z(o))0„  ,z (k)j  »  (see  Appendix  D) 

The  solution  of  the  problem  of  estimating  the  present  state  of  the 
system  is  given  by  (2.51 ),  (2.52)  and  (2.53) „  or  equivalenUy  by  (2.51), 
(2.5^)  and  (2.55).  In  practice  the  decision  to  use  either  (2.52)  and 
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Fig.  7   Noisy  linear  system  and  optimal  estimator 
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(2.53)  or  the  equivalent  relations  (2.5*0  and  (2.55)  would  be  made  to 
minimize  the  dimension  of  the  matrix  to  be  inverted.  A  block  diagram 
of  the  system  and  the  estimator  is  given  in  Fig.  7»     The  optimal  estimator 
is  linear  and  consists  of  a  model  of  the  system  in  a  feedback  loop.  Com- 
pare Fig,  7  with  Fig.  **. 

If  the  estimate,  x(k|n)  is  desired,  one  could  proceed  as  follows; 

1.  Compute  and  save  x(klk)  and  C(k)  for  k=o,...,n  using  (2.51 )» 
(2.52)  and  (2.53)  or,  equivalently  (2.51  ), (2.5*0  and  (2.55). 

2.  Calculate  x(kln)  by  backward  recursion  of  (2,3*0  and  (2.50). 
This  procedure  begins  by  calculating  _\(n-1  )  using  (2.3*0. 
Then  obtain  x(n-1 1  n)  using  (2.50).  Use  (2.3*0  to  obtain 
X(n=,2)  and  then  (2.50)  to  obtain  x(n-2ln)„  Continue  this 
alternating  procedure  until  x(oln)  is  obtained. 

An  alternate  procedure  would  be  to  combine  (2.50)  and  (2,3*0  to 
obtain  the  following  equation  for  Xj 

X(k-1 )  =  [I  -  H»  (k)R~1  (k)H(k)C(k)  ]  F'(k)A(k) 

+  H'(k)R°1(k)  [z(k)-H(k)x(klk)]  (2.56) 

The  following  matrix  identities  can  easily  be  derived  by  the 
methods  of  Appendix  A; 

I  -  h'iHh  [  i  +  ra»fr1H]~1p   =  [i  +  h»r"1hp  ]~1 

Using  these  identities,  (2«56)  may  be  rewritten  in  different  forms. 
Then  step  2  above  can  be  replaced  by  the  following; 
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2.   Calculate  _^(k)  by  backward  recursion  of  a  convenient 
form  of  (2.56).  Then  calculate  x(k|n)  using  (2.50). 
By  repeated  use  of  matrix  identities  (2.56)  may  be  written  in 
the  following  form  which  later  will  be  compared  with  other  results; 

A(k-1)  =  {l  -  H'(k)  [R(k)  +  H(k)P(k)H' (k)  ]  ~1  H(k)P(k)J  P»  (k)A(k) 

+  H'(k)  [H(k)P(k)H'(k)  +  R(k)]~1  [  z(k)-H(k)x(klk-1 ).]  (2.57) 

The  solution  of  the  linear  smoothing  problem  may  be  given  in  an 
even  simpler  form  if  we  write  the  equations  in  terms  of  x(k|k-1).  Later 
when  we  discuss  the  application  of  linearization  techniques  to  nonlinear 
estimation  problems ,  the  formulas  we  shall  obtain  will  be  analogous  to 
the  ones  already  given  for  the  linear  problem,,  since  we  shall  wish  to 
linearize  about  the  point  x(kl  k)  rather  than  the  point  x(klk-1). 

Combining  (2.50)  and  (2.5^)  using  the  matrix  identities,  we  obtain 
the  following  equation; 

x(kln)  =  i(klk-1)  +  P(k)H»(k)[H(k)P(k)H«(k)  +  R(k)]  ~1 

[z(k)-H(k)x(klk-1)  ]+   P(k){  I  -  H»  (k)  [  H(k)p(k)H»  (k)  +  R(k)]~1 

H(k)P(k)}  F« (k)  J^(k) 

Combining  this  relation  with  (2.57)  yields 

x(kln)  =  x(k|k-1)  +.P(k)V(fc-1) 

We  may  summarize  this  result  in  the  following  corollary; 

Corollary;  The  solution  of  the  linear  smoothing  problem  is  given  by 
the  following  equations; 
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x(kln)  =x(k|k-l)  +  P(k)A(k-1) 

x(k+1lk)  =  F(k)x(k|k-1)  +  F(k)P(k)H«  (k)  [  H(k)P(k)H»  (k)  +  R(k)]  "1 

[z(k)  -  H(k)x(kl  k-1  )  ] 

where  P(k)  is  given  by  (2.55)  and-  \(k^  )  is  given  by  (2.57)  for  k=o,...,n. 
The  a  priori  mean  m  is  used  for  x(o  1-1 )  in  the  above  expressions  and 
\(n)   =  o. 

2.73   The  Existence  of  [I  +  PH'iHh  ]~1 

In  the  following  discussion  we  shall  use  some  matrix  identities 
and  the  following  elementary  facts  from  matrix  theory  developed  in  Appendix  A; 

i.   The  inverse  of  a  positive  definite  matrix  is  positive 
definite, 
ii.   The  matrix  formed  by  adding  a  positive  definite  matrix 
to  a  non-negative  definite  matrix  is  positive  definite 
and  hence,  has  an  inverse. 
One  may  verify  by  direct  multiplication  that  the  unique  inverse 
of  I  +  PH'R""1H  is 

I  =  PH'  [HPH'+R]=1H  (2.58) 

If  Jl  is  positive  definite  and  P  is  non-negative  definite,  (2.58)  will 
always  exist  by  virtue  of  ii.  Using  (2.58),  C(k)  may  be  written  in  the 
following  form? 

C(k)  =  P(k)  -  P(k)H'(k)  [  H(k)P(k)H'Xk)  +  R(k)]  "1  H(k)P(k)     (2.59) 

The  relation  (2.59)  is  a  generalization  of  a  lemma  given  by  Ho  [  25],  who 
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required  that  P(k)  be  positive  definite. 

While  the  inverse  of  a  matrix  is  unique,  it  frequently  may  be 
written  in  various  forms  which  bear  little  resemblance  to  each  other.  If 
P(k)  is  positive  definite  for  example,  the  inverse  of  [  I  +  PH'R  H  1  may 
be  written  in  the  following  form; 

[  Jf 1  +  S'lf  12 ]  °1 1"1  (2»60) 

which  surprisingly  is  identical  to  (2.58).  Using  (2.60),  we  may  write 
(2o51 )  in  the  following  form; 

C(k)  =  [  P~1 (k)  +  H' (k)R=1 (k)H(k) ]  ~1  (2.61 ) 

From  (2.61)  and  ii  it  follows  that  if  P(k)  is  positive  definite,  then  C(k) 
is  positive  definite.  This  is  in  accord  with  the  fact  that  P(k)  and  C(k) 
are  the  covariance  matrices  of  the  probability  distributions  for  x(k) 
given  {  z(o),oo0!>z(k-l)J  and  {  z(o),.„.  ,z(k)  ]  respectively.  A  singular  co- 
variance  matrix  signifies  that  x(k)  is  confined  with  probability  one  to  a 
hyperplane  in  n=space.  It  is  clear  that  this  cannot  happen  as  the  result 
of  measurements  which  are  corrupted  with  noise  having  a  non-singular  co- 
variance  matrix  R(k). 

Consider  (2.52),  which  we  rewrite  for  convenience, 

P(k+1  )  =  F(k)C(k)F' (k)  +  G(k)£(k)G» (k)  (2.52) 

From  (2.52)  we  see  that  a  sufficient  condition  for  P(k+1 )  to  be  positive 
definite  is  that  either  G(k)£(k)G ' (k)  be  positive  definite  or  C(k)  be 
positive  definite  and  F(k)  be  non-singular.  This  may  easily  be  related 
to  the  basic  equation  for  the  system, 

x(k+1 )  =  F(k)x(k)  +  G(k)w(k)  (2.31 ) 
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From  (2.31 )  we  see  that  P(k+1 )  will  be  positive  definite  if  every  point 
in  n-space  may  be  reached  in  one  transition  from  some  point  in  the  set 
of  possible  values  of  x(k)  (specified  by  C(k)  )  using  some  w(k).  Thus, 
the  condition  obtained  from  (2.52)  is  seen  to  be  sufficient  but  not  nec- 
essary,, 

Finally,  we  note  that  if  P(o),  R(k)  and  F(k)  are  non-singular  for 
all  k,  then  P(k)  and  C(k)  are  also  positive  definite  for  all  k„ 

2„7k      Generalizations 

The  restrictions  that  P(o)  and  R(k)  be  positive  definite  may  be 
relaxed  without  altering  the  basic  structure  of  the  solution.  In  order 
to  relax  these  restrictions  we  use  the  generalized  inverse  discussed  in 
Appendix  BQ  Appropriate  background  material  concerning  probabilistic 
aspects  of  singular  covariance  matrices  is  given  in  Appendix  C.  In  this 
section  we  shall  simply  state  the  results  obtained  in  Appendix  D  and  com- 
pare them  with  those  already  obtained  in  this  chapter.  For  a  more  detailed 
discussion  the  reader  is  referred  to  Appendix  D,  which  considers  a  very 
general  version  of  the  linear  estimation  problem  in  which  correlation 
between  v  and  w  is  allowed  and  the  covariance  matrices  P(o)  and  R(k)  may 
be  singular. 

The  solution  of  the  linear  estimation  problem  is  given  by  the 
following  equations  which  are  the  same  as  (2.50),  (2.59),  (2.52),  (2.5^) 
and  (2.57)  except  that  in  (2.59),  (2.5^)  and  (2.57)  the  inverse  is  re- 
placed by  the  generalized  inverse? 

x(k|n)  =  x(k|k)  +  C(k)F» (k)^(k)  k=o,...,n       (2.50) 

C(k)  =  P(k)-P(k)H»  (k)  [  H(k)P(k)H' (k)  +  R(k)]#H(k)P(k)         (2.62) 
P(k+1  )  =  F(k)C(k)F« (k)  +  G(k)£(k)G» (k)  (2.52) 
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x(k|k)  =  x(klk-1)  +  P(k)H«  (k)  [  H(k)P(k)H'  (k)  +  R(k)]* 

[  z(k)-H(k)x(k!k-1)]  k=1,..,,n       (2.63) 

A(k-1 )  =  (  I  -  H« (k)  [  R(k)  +  H(k)P(k)H« (k)]#  H(k)P(k) j  F' (k)A(k) 

+  H'(k)  [  R(k)  +  H(k)P(k)H»(k)]#  [  z(k)-H(k)x(klk-1 )] 

k=1,...,n       (2.64) 
The  boundary  conditions  are 

x(olo)  =  a  +  P(o)H»(o)f  H(o)P(o)H»(o)  +  R(o)]#  [  z(o) -  H(o)m]  (2.65) 

\(n)  =  o 

The  generalized  inverse  may  be  replaced  by  any  pseudo-inverse.  The 
pseudo-inverse  was  used  by  Kalman  [29  ]  in  his  solution  of  the  linear  fil- 
tering and  prediction  problem.  We  have  extended  this  work  to  include  the 
smoothing  problem  and  have  relaxed  the  restriction  that  F(k)  be  non-singular. 
The  only  restrictions  remaining  are  that  R(k)  and  P(o)  be  valid  covariance 
matrices  and  that  observations  do  not  occur  which  are  inconsistent  with 
prior  knowledge.  This  point  is  discussed  in  Appendix  C.  The  block  diagram 
of  Fig.  7 9   in  which  the  estimator  producing  up-to-date  estimates  of  present 
state  variables  consists  of  a  model  of  the  system  in  a  linear  feedback 
loop,  remains  valid  if  C(k)HJ (k)R°  (k)  is  replaced  by 

P(k)H « (k)  [  H(k)P(k)H» (k)  +,  R(k)  ]  # 

These  two  expressions  are  equivalent  if  IjH  exists. 

If  one  is  interested  in  the  linear  smoothing  problem,  the  corollary 
of  section  2.72  remains  valid  if  the  inverse  in  the  equation  for  x(k+l|k) 
is  replaced  by  the  generalized  inverse. 


^53- 


2o8   An  Approximation  Technique 
2a81  Linear  Observations 

There  is  a  great  need  for  approximation  techniques  which  can  be 
used  to  obtain  estimates  in  real  time.  In  this  section  we  shall  present 
a  method  which  is  computationally  attractive  and  which  possesses  some 
intuitive  appeal 0 

For  simplicity  we  shall  first  assume  that  h(x)  is  linear,  an 
assumption  which  will  appreciably  simplify  the  resulting  equations  and 
which  represents  a  case  of  great  practical  significance e  In  this  case  we 
say  that  the  observations  are  linear,,  Consider  the  approximation  to  (2„1 ) 
obtained  by  the  following  linearization ° 

x(k+1)*  f(x*(k|k),k)  +  (3f(x*(klk),k)/ax)  [x(k)=x*(klk)]  +G(k)w(k)  (2066) 

The  point  x  (klk)  about  which  the  linearization  is  made  is  the  estimate 
of  x(k)  given  {  z(o),,9.,z(k)]0  We  use  the  asterisk  to  remind  ourselves 
that  the  estimates  produced  by  this  linearization  procedure  are  only 
approximations  and  do  not  necessarily  correspond  exactly  to  the  mode  of 
the  a  posteriori  distribution „ 

If  the  function  In  of  (2„9)  is  replaced  by 

In  =  4ll*(o)-sll  -    +   E  i  ll£(k)-H(k)x(k)  ||  1 

+  E  {  illsOO  II   „    +  A'(k)  [[x(k+1)~f(x*(klk),k)] 
feo  l         £"1  (k)        k 

-  Uf(x*(k|k),k)/3x)  [x(k)-x*(k|k)]  -  G(k)w(k)|}      (2e6?) 

and  the  partial  derivatives  of  In,  with  respect  to  w(k),  A(k)  and 
x(k)  are  set  equal  to  zero,  a  modified  version  of  the  two-point 
boundary  value  problem  is  obtained c   The  resulting 


equations  are  as  follows; 

x*(k+1ln)  =  f(x*(klk),k)  +  (3f(x*(klk),k)/ax)  [  x*(k|n)-x*(klk)J 

+  G(k)£(k)G» (k)>(k)  (2.68) 

\(k-1)  =  (af'(x*(klk)9k)/ax)  2(k)  +H'(k)R"1(k)  [  z(k)-H(k)x*(kln)] 

(2.69) 
The  boundary  conditions  are 

X(n)  =  o  (2.70) 

x*(o  In)  =  m  +  P(o)H»  (o)R°1  (o)  [  z(o)-H(o)x*(o|  n)  ] 

+  P(o)(3f'(x*(o|o),o)/ax)  X(o)  (2e?1) 

A  comparison  of  (2.71)  and  (2.35)  indicates  that  (2..71  )  may  be  written 
in  the  following  form  which  is  similar  to  (2«*H  )  ; 

x*(oln)  =  x*(o|o)  +  C*(o)(3f'(x*(olo),o)/ax)  ^(0)  (2,72) 

where  x  (olo)  and  C  (o)  are  equal  to  x(o|o)  and  £(0)  respectively  and  are 
given  by  (2.40)  and  (2.39).  It  is  not  at  all  surprising  that  the  solution 
of  this  linearized  problem  closely  resembles  that  of  the  linear  problem 
already  studied.  We  define  the  following  quantities; 

C*(k)  =  [  I  +  P*(k)H«  (k)R~1  (k)  ]  "1  P*(k)  (2.73) 

P*(k+1)  =  (  3f(x*(klk),k)/3x)  C*(k)  (3fI(x*(klk),k)/3xj 

+  G(k)£(k)G'(k)  (2.7^) 


* 


I   (o)  =  P(o)  (2.75) 
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We  now  hypothesize  that  the  x  (k|n)  satisfying  the  two-point 
boundary  value  problem  (2.68),  (2.69),  (2.70)  and  (2.71)  is  given  by 
the  following  equations; 

x*(k|n)  =  x*(k|k)  +  C*(k)(a  f • (x*(klk)9k)/a  x)  >(k)  k=0,...,n  (2.76) 

where 

x*(k|k)  =  f(x*(k-1  |k-l),k-l)  +  C*(k)H'(k)R~1(k) 

[z(k)-H(k)f(x*(k-l|k»l),k-l)]  k=1 n  (2.77) 

and  x  (o|o)  is  given  by  (2.40).  We  again  proceed  by  induction  showing 
that  if  (2.76)  is  satisfied  for  some  k  £  (o,...,n-l)  then  it  is  also  sat- 
isfied for  k+1  „  We  know  that  (2.76)  is  satisfied  for  k=o.  The  steps  in 
the  following  proof  are  completely  analogous  to  those  used  to  prove  the 
theorem  of  section  2.72. 

Suppose  that  x  (k|n)  satisfies  (2.76)  for  some  k  €  (o,...,n-1). 
Then,  by  (2.68) 

x* (k+1  In)  =  f(x*(k|k)9k)  -  (af(x*(klk),k)/dx)  x*(klk) 

+  (af(x*(klk),k)/3x)  x*(klk) 

+  {  (sf(x*(k|k),k)/3x)  C*(k)  (Sf'(x*(klk),k)/3x) 

+  G(k)£(k)G'(k)]  X(k) 

or,  equivalently 

x*(k+1ln)  =  f(x*(klk),k)  +  P*(k+1)  X(k) 

Substituting  for  X(k)  from  (2.69)  this  expression  becomes 
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x*(k+1|n)  =  f(x*(klk),k)  +  P*(k+1)H'(k+1)R~1(k+1) 

[z(k+1)  -  H(k)x*(k+1|n)] 

+  P*(k+1  )(3  f  •  (x*(k+1  lk+1  ),k+1  )/a  x)  ^(k+1 ) 

Solving  this  expression  for  x  (k+1  In)  yields 

x*(k+1  In)  =   [  I  +  P*(k+1  )H«  (k+1  )R"1  (k+1  )H(k+1 ) J  " 

|  f  (x*(k  Ik)  ,k)  +  P*(k+1  )H«  (k+1  )R"1  (k+1  )z(k+1  ) 

+  P* (k+1  )(sf'(x* (k+1 1  k+1 ) ,k+1 )/ a x)  A(k+1 )  ] 

This  expression  may  be  rewritten  in  the  following  form; 

x*(k+1 1  n)  =  f (x*(k Ik) ,k)  +  C*(k+1  )H«  (k+1  )R"*1  (k+1  ) 

[  z  (k+1  )~H(k+1  )f  (x*(k Ik) , k)  ] 

+    C*  (k+1  )(3f'(x*  (k+1 1  k+1 ) ,  k+1 )  /a  x)  i(k+1 ) 

Using   (2,77 )»  this  expression  becomes 

x*(k+1 1  n)  =  x*(k+1  I  k+1 )  +  C*(k+1  )(a  f « (x*(k+1  I  k+1  ),k+1  )/a  x)  J\(k+1 ) 

This  expression  satisfies  the  hypothesis  (2.77)  and  noting  that  (2.77) 
is  satisfied  for  k=o  concludes  the  proof. 

Equations  (2.73),  (2.7*0  and  (2.77)  constitute  a  recursive  scheme 
in  which  each  new  observation  can  be  processed  immediately  to  produce  an 
up-to-date  estimate.  The  matrices  C  and  P  can  no  longer  be  strictly 
interpreted  as  covariance  matrices  of  the  distribution  for  x»  as  they 
could  be  if  the  basic  system  were  actually  described  by  (2,66)  instead 
of  (2.1). 
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A  block  diagram  of  the  estimator  described  by  equation  (2.77)  is 
given  in  Fig«  8.  Again  we  see  the  model  of  the  system  appearing  in  a 
feedback  loop.  Equation  (2.77)  or,  equivalently,  the  estimator  of  Fig.  8 
may  be  given  a  simple  intuitive  interpretation  if  we  interpret  the  in- 
dividual terms  in  the  following  manner; 

f(x  (k-1 |k-l),k-l)  a  prediction  of  x(k) 

C  (k)  a  measure  of  present  uncertainty 

H' (k)  sensitivity  of  observation  to  deviations  in  x(k) 

R~1 (k)  reliability  of  measurements 

H(k)f (x*(k-1 I k-1 ) ,k-1 )  a  prediction  of  z  (k) 

The  equation  then  has  the  interpretation  that  the  error  in  the  prediction 
of  z_(k)  is  weighted  in  proportion  to  present  uncertainty 0  sensitivity  of 
the  observation,  and  reliability  of  the  measurement «,  This  weighted  error 
is  then  used  to  modify  the  predicted  value  of  x(k)  to  obtain  an  up-to-date 
estimate  of  x(k)„ 

Equations  (2,73 )»  (2„7^)  and  (2e77)  or  equivalent  equations  ob- 
tained by  the  use  of  matrix  identities  are  particularly  suitable  for  real 
time  implementation  on  a  digital  computer 0  Simulation  studies  may  be 
conducted  to  evaluate  the  applicability  of  this  approximation  to  a 
particular  system.  The  results  of  some  simulation  studies  using  this 
linearized  estimation  technique  are  given  in  Chapter  V« 

When  a  detailed  analysis  of  all  available  data  is  to  be  made  after 
the  completion  of  some  experiment  such  as  a  rocket  flight,  one  is  faced 
with  numerical  solution  of  the  nonlinear  two-point  boundary  value  problem « 
Some  method  of  successive  approximations,  such  as  a  version  of  the  gradient 
method  of  KeHey  [35  ]  and  Bryson  [  9] ,  is  usually  called  for«,  There  is 
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danger  of  converging  to  a  local  minimum  if  the  conditional  probability 
density  function  (207)  is  not  unimodal.  This  danger  is  reduced  and  the 
speed  of  convergence  increased  if  a  good  first  approximation  is  available. 
The  logical  choice  for  such  a  first  approximation  is  x  (k|n)  which  may  be 
calculated  by  backward  recursion  of  (2.69)  and  (2.76).  Alternately,  we 
may  combine  (2C69)  and  (2,76)   to  obtain  the  following  equation  for  A_ 
which  is  similar  to  (2.56) ? 

A(k-1)  =  Li  -  H'(k)r1(k)H(k)c*(k)]   (3f'(x*(klk),k))/ax)  A(k) 

+  H'(k)R"1(k)  [  z(k)  -  H(k)x*(klk)]  (2.78) 

Then  x  (kin)  may  be  calculated  using  (2.78)  and  (2.76). 

2.82   Generalizations 

Having  obtained  some  confidence  that  linearization  techniques  will 
produce  equations  closely  resembling  the  equations  for  the  linear  estima- 
tion problem,,  there  are  several  directions  in  which  one  could  generalize. 
Firste  one  could  complicate  the  problem  without  introducing  new  nonlinear- 
ities  by  allowing  R(k)  and  P(o)  to  be  singular  and  permitting  correlation 
between  w  and  v.  That  is,  one  could  study  the  problem  of  Appendix  D  using 
(2.66)  in  place  of  (2.1 )o  Second,  one  could  introduce  new  nonlinearities 
by  allowing  h  to  be  nonlinear  and/or  allowing  w  to  enter  nonlinearly  as 
in  (2.3) o  Third,  one  could  combine  these  first  two  types  of  complications. 

We  shall  discuss  each  of  these  possibilities  briefly,  indicating 
how  the  results  obtained  previously  would  be  modified  in  these  situations, 
but  omitting  detailed  derivations. 

The  generalization  to  include  singular  R(k)  and  P(o)  and  allow 
correlation  between  v  and  w  is  completely  straightforward.  The  derivation 
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of  these  results  is  completely  analogous  to  section  D5  of  Appendix  D 
and  may  easily  be  obtained  by  using  section  D5  as  a  guide.  The  basic 
equations  which  result  from  such  a  derivation  are  as  follows; 

x*(k|k)  =  x*(k|k-1)  +  P*(k)H<(k)B*(k)  [  z(k)~H(k)x*(klk-1)]     (2.79) 

x*(k+1  |k)  =  f (x*(k|k),k)  +  G(k)S(k)B*(k)  [  z(k)-H(k)x*(klk-1 )]   (2.80) 

x*(k|n)  =  x*(klk)  +P*(k)  [  (3f'(x*(klk),k)/3x)-H'(k)M*(k)  ]A(k) 

k=o,„.„,n    (2.81) 

>(k-1)  =  [  (af'(x*(klk),k)/ax)  -  H'(k)M*(k)l  >.(k) 

+  H«(k)B*(k)  [  z(k)»H(k)x*(k|k-l)]      k=1,„..,n    (2.82) 

P*(k+1)  =  (af(x*(klk),k)/ax)  P*(k)  (af '  (x*(klk),k)/e>  x) 

+  G(k)a(k)G'(k)  -  M*'(k)B*#(k)M*(k)  (2.83) 

B*(k)  =  [H(k)P*(k)H'(k)  +R(k)J  #  (2.84) 

M*(k)  =  B*(k)  [H(k)P*(k)(3f«(x*(klk),k)/ax)  +  S«(k)G'(k)]     (2.85) 

A  block  diagram  of  the  estimator  described  by  equation  (2.79)  and 
(2.80)  is  given  in  Fig.  9.  Again  we  have  a  model  of  the  system  appearing 
in  a  feedback  loop. 

If  new  nonlinearities  are  introduced ,  then  we  are  forced  to  further 
linearization.  We  shall  first  consider  the  case  in  which  h  is  nonlinear. 
In  order  to  acquire  a  feeling  for  this  situation  we  shall  examine  the 
simplest  possible  example,  n=o.  Then, 

2  2 

I0=    £l|x<o)-ffill      -  +    ilU(o)-h(x(o),o)  II 

r1(o)  F1(°)  (2o86) 
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Dashed  lines  indicate  x  (k~1 lk=1)  is  needed  to  calculate  P  (k)  and  B  (k) 


Fige  9   Nonlinear  discrete- time  estimator  when  v(k)  and  w(k)  are  correlated. 
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or,  equivalently, 

2  2 

I  =  III  x(o)-m||   -    +  illz(o)-h(mto)-  [  h(x(o),o)-h(m,o)]  || 

0      "       r1(o)  r-  1(o) 

(2.8?) 
This  may  be  rewritten  as 

2  2  2 

I0  =illx(o)-ra||  1    +i||z(o)»h(ra,o)||  1    +  i||h(x(o),o)-h(nto)||  1 

P  (°)  f  W  I ;  <°> 

-rh(x(o),o)-h(m,o)J«  R~1  (o)  [z(o)=h(m,o)]  (2e88) 

The  second  term  in  this  expression  is  independent  of  x.  The  third  and 
fourth  terms  we  expand  about  the  a  priori  mean  m  retaining  the  first  non= 
zero  terms  o  The  fourth  term  becomes 

(x(o)-m)«(3h«(m,o)/3x)  R"1  (o)  [  z(o)-h(m,o)]  (2.89) 

The  third  term  becomes 

i  (x(o)~m)«  {  3[(3h,(|,o)/3x)  R-1(o)  h(m,o)]/ax  ]  (x(o)-m)    (2o90) 
Let 

C~1(o)  =  H(o)  +  3[(3h'M/3x)  r1(o)  h(m9o)J/3x        (2.91) 
Then  (2.86)  may  be  approximated  by  the  following  expression 

2  2 

i0  ~  *  Bs(o)-a  II  1  +  i  II  i(o)-k(s»o)  II  1 

r1(o)  r  (o) 

+  (x(o)-m)»(ah'(m,o)/c£>x)  R=1  (o)  [z(o)-h(m,o)]  (2.92) 
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Differentiating  this  expression  with  respect  to  x(o)  and  setting  the 
result  equal  to  zero  yields 

x*(olo)  =m+C(o)(3h'(m,o)/3x)r1(o)  [  z(o)-h(m,o)  ]        (2093) 

Equations  (2.91)  and  (2.93)  are  similar  in  structure  to  (2.72)  and  (2.77)« 
If  we  interpret  (  5h'  (m,o)/  3x)  as  the  sensitivity  of  the  observations  to 
deviations  of  x  from  the  prior  mean,  we  may  give  (2.93)  a^d  interpretation 
similar  to  that  which  was  given  in  (2„77)»  The  same  basic  structure  is 
retained  for  arbitrary  n.  Note  that  h(x(k),k)  is  linearized  about 
x  (klk-1)  since  x  (k|k)  cannot  be  computed  until  z^(k)  is  observed.  The 

expression  for  C(k)  is  of  the  same  form  as  the  expression  for  C(o)  given 

i 

by  (2.91 )o  P(k+1 )  is  computed  by  (2.74)  as  before. 

Let  us  again  consider  the  case  n=o9  but  now  allow  R(o)  to  be  sing- 
ular. In  this  case,  following  Appendix  D,  we  consider  v  to  be  the  result 
of  a  linear  transformation  on  a  normalized  random  vector  u.  That  is, 
X  =  Jji  where  TT1  is  equal  to  R(o).  Introducing  the  Lagrange  multiplier 
vector  £,  the  expression  to  be  minimized  becomes 

2  2 

I0  =  *ll^°)-all  «    +  ills  II   +  i'  [z(o)=h(x(o),o)»Tu]    (2.9^) 
P"1(o) 

This  expression  will  be  approximated  by  the  following  linearized  expression; 

2  2 

I0  =  *ll£(o)-sll   -    +  ills  II   +I'f  z(o)-h(m,o)-(3h(m,o)/ax) 
r1  (o) 

[  x(o)=m]  -  Tu}  (2.95) 

ak 

Setting  the  total  differential,  dl  (x(o),u,£),  equal  to  zero  yields  the 
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following  equations; 

x*(olo)  =  m  +  P(o)Oh»(B,o)/3x)  ±  (2.96) 

z(o)  =  h(m,o)  +  (^>h(m,o)/ax)   [  x  (oio)-m]   4     Tu  (2e97) 

u  =  T'i  (2.98) 

Combining  these  three  equations,  we  obtain 

z(o)-h(m,o)  =    [  Oh(m,o)/ax)  P(o)   (a  h>  (m,o)/3  x)  +  R(o)]  §.(0)   (2.99) 

Using  the  generalized  inverse  to  solve  for  a  ^(o)  and  combining  (2.96) 
and  (2.99)  yields 

x(o)  =  m  +  P(o)  (3h»(i,o)/5x)  [  (  3h(m,o)/a  x)  P(o) 

(ah«(m,o)/3x)  +  R(o)]#  [z(o)-h(m,o)]  (2.100) 

Note  that  (2.93)  and  (2.100)  are  not  the  same,  nor  can  they  be 

shown  to  be  equal  using  matrix  identities  even  if  R"  (o)  exists.  The 

reason  for  this  situation  is  that  if  R°  does  not  exist  we  are  forced  to 

linearize  earlier  than  if  R~  exists.  The  procedure  we  have  used  for 

singular  R  is  equivalent  to  approximating  the  second  term  in  (2.88)  by 

the  following  expression? 

2 
2  l|z(o)-h(m,o)-(3h(m,o)/©x)  [x(o)-m]||  1 

£  (o) 

Here  the  approximation  takes  place  before  the  multiplication;  whereas, 
previously  we  were  able  to  multiply  and  then  approximate  the  resulting 
expression. 

The  basic  structure  of  the  situations  which  arise  when  h  is  non- 
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linear  is  well  illustrated  by  the  simple  case  n=o.  The  required  modifica- 
tions of  our  previous  results  are  evident  in  (2,91 )#  (2o93)  and  (2.100). 

Suppose  that  w  entered  the  system  nonlinearly.  We  would  then 
approximate 

x(k+1)  =  f(x(k)9  w(k),  k)  (2.3) 

by 

x(k+1)c^  f(x*(klk),o,k)  +  (a  f(x*(klk),o,k)/ax) 

|x(k)-x*(klk)J  +  (&f(x*(k|k),ofk)/aw)  w(k)       (2.101) 

It  is  clear  that  (s  f (x*(klk),o,k)/a  w)  will  play  the  role  of  G(k)  in 
our  previous  work. 

The  reader  should  have  no  difficulty  in  combining  these  results 
to  meet  the  needs  of  any  particular  problem,,  How  well  these  approximation 
techniques  will  work  will,  of  course,  depend  on  how  well  the  system  is 
approximated,  the  frequency  of  measurements,  noise  levels  and  other 
factors  peculiar  to  the  individual  problem,,  Similar  linearization  schemes 
have  been  developed  from  another  point  of  view  in  recent  independent 
studies  by  other  investigators  [37*  59]  who  give  encouraging  results, 

2083   Linearity  in  Disguise 

In  certain  special  cases  the  approximation  of  (2,1)  given  by  (2,66) 
becomes  exact  and  equations  given  by  the  linearization  technique  are 
strictly  correct „  We  rewrite  (2,1)  and  (2,66)  for  convenience, 

x(k+1 )  =  f  (x(k),k)  +  G(k)w(k)  (2,1 ) 

x(k+1)  =  f(x*(k|k),k)  +  (3f(x*(k|k),k)/c)x)[x(k)»x*(k|k)]+G(k)w(k) 

(2.66) 
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The  first  case  is  the  trivial  one  in  which  f(x(k),k)  is  linear0  That  is, 
when  (201 )  takes  the  following  form; 


x(k+1 )  =  F(k)x(k)  +  G(k)w(k) 


(2.31) 


The  second  case  arises  when  all  state  variables  are  exactly  measurable 
and  x  (k|k)  is  equal  to  x(k). 

Various  combinations  of  these  two  cases  may  arise  in  which  it  is 
not  so  obvious  that  the  underlying  estimation  problem  is  linear  even  though 
the  basic  equations  are  nonlinear.  Consider  the  case  in  which  the  state 
vector  x  may  be  decomposed  into  an  exactly  observable  part  x.  and  a  re- 
maining part  x^.  The  underlying  estimation  problem  will  be  linear  if  the 
basic  system  is  of  the  following  form; 


£j  (k+1  )" 

!-,.,(£,  00  »k) 

F^CXjOO.k)' 

'^(k)" 

4. 

\  00~ 

w(k) 

X2(k+1) 

0 

F^k) 

Xg(k) 

G2(k) 

(2  J  02) 

zOO  =  2 

i(k) 

(2.103) 

In  this  case  z   may  be  substituted  in  the  equation  for  x*   to  produce  the 
following  relation; 

z(k+1  )  -  fu  (z(k),k)  z(k)  =  F12(z(k),k)  XgCk)  +  G.,  (k)  w(k)      (2.104) 

The  relation  shows  that  a  difference  of  known  quantities  is  equal  to  a 
linear  combination  of  the  unknown  part  of  the  state  vector  plus  additive 
Gaussian  noise.  This,  together  with  the  following  linear  equation  for  x_ 


x2(k+1)  -  F22(k)  x2(k)  +  G2(k)  w(k) 
shows  that  the  estimation  problem  is  linear  and  the  distribution  for  x? 


(2.105) 
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remains  Gaussian  even  though  the  basic  system  (2.102)  is  nonlinear. 

For  example,  consider  the  following  simple  system  with  a  random 
parameter  X£? 

x1  (k+1 )  =  [  a  +  x£(k)  ]  X1  (k)  +  w-,  (k) 
x2(k+1)  =   b  ^(k)  +  w2(k) 
z(k)  =  x.,(k)  +  v(k) 

If  the  additive  noise  v  is  not  identically  zero  the  estimation  problem 
is  nonlinear  and  the  use  of  the  linearization  technique  would  give  only 
an  approximate  solution  to  the  estimation  problem.  If  v  is  identically 
zero  this  is  a  special  case  of  (2e102)o  Then  the  estimation  problem 
is  linear  and  the  linearization  technique  gives  the  exact  solution  to 
the  estimation  problem,,  We  see  from  this  simple  example  that  the  validity 
of  the  approximation  may  decrease  as  noise  levels  increase.  This  is  in 
addition  to  the  expected  decrease  in  the  performance  of  even  the  optimal 
estimator  when  noise  levels  increase. 

As  a  second  example,,  consider  the  following  system  in  which  the 
transition  matrix  F(k)  is  random; 

x(k+1 )  ■  F(k)x(k)  +  Gj (k)w(k)  (2.106) 

z(k)  =  x(k)  (2.107) 

The  elements  of  the  transition  matrix  F(k)  are  considered  to  be  the  out- 
puts of  a  linear  system  excited  by  an  independent  noise  sequence.  In 
order  to  see  how  such  a  situation  would  be  handled  we  shall  examine  the 
following  second  order  system; 
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"x1  (k+1  ) 

-f„(k) 

f12(k) 

x,(k) 

x£(k+1) 

_f21  (k) 

f22(k)    - 

_x2(k)  . 

z^k)     = 

x,(k) 

z2(k) 

"     X2 

(k) 

+  £,(102(10 


Let  us  augment  the  state-space  by  adjoining  the  random  parameters  to 
the  state  vector  as  follows; 


f^Oe+1) 
f12(k+1) 

f21  (k+1 ) 
f22(k+D 


x3(k+1) 
x^(k+1 ) 
x5(k+1) 
x6(k+1 ) 


A(k) 


x3(k) 

x4(k) 

x5(k) 

x6(k) 

G0(k)w(k) 


Then  the  equation  for  the  augmented  system  may  be  written  in  the  form  of 
(2.102)  as  follows; 


x.,(k)    x2(k)     0      0 
0       0      x.j(k)   x2(k) 


A(k) 


x^k) 

x2(k) 

x3(k) 

x^(k) 

x^(k) 

x6(k) 

x1  (k+1  ) 
x^k+1 ) 
x^k+1) 
x^  (k+1 ) 
x5(k+l ) 
x6(k+1) 

The  basic  technique  used  for  the  second  order  example  is  readily 
extended  to  the  nth  order  case0 

The  fact  that  a  nonlinear  estimation  problem  may  become  linear 
if  certain  noise  levels  go  to  zero  can  be  a  useful  tool  in  investigating 
nonlinear  estimation  problems »  If  we  observe  that  such  is  the  case  for 
a  particular  estimation  problem,,  we  immediately  acquire  an  upper  bound 
on  the  performance  that  can  be  obtained.  In  addition,  we  may  gain 


Gi(k) 


G?(k) 


w(k) 
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insight  into  the  coupling  between  various  noises  and  the  quantities  to 
be  estimated,,  For  example,  in  (2.104)  we  see  that  the  random  input  w 
is  playing  the  role  of  additive  measurement  noise  in  the  problem  of 
estimating  x^. 

Lest  we  become  complacent ,  consider  what  would  happen  if  for 
(2,102)  we  were  able  to  observe  only  Xg.  It  is  evident  that  we  would 
acquire  no  new  information  about  x. .  This  difficulty  raises  the  inter- 
esting question  of  observability  which  will  be  discussed  in  Chapter  IVC 

2.84   Successive  Linearization 

If  one  is  interested  in  a  problem  of  ex  post  facto  data  analysis 
the  job  of  obtaining  a  numerical  solution  of  the  two-point  boundary  value 
problem  arises.  As  was  mentioned  earlier,  a  logical  first  approximation 
to  the  solution  would  be  x  (k|n).  One  method  that  might  be  considered 
is  that  of  successive  linearization.  This  method  is  recommended  on  the 
grounds  that  it  is  comparatively  simple  and  that  rapid  convergence  can 
be  expected.  Unfortunately,  convergence  cannot  be  guaranteed,,  This  is 
the  price  to  be  paid  for  simplicity.  In  a  problem  where  there  is  no  re- 
quirement for  real  time  solution  it  may  be  worthwhile  to  try  this  method 
first  in  hopes  of  simply  obtaining  a  fast  solution  while  running  a  slight 
risk  of  non-convergence.  The  results  of  some  encouraging  applications 
of  this  technique  are  given  in  Chapter  V. 

The  idea  behind  the  method  of  successive  linearization  is  this. 
The  estimate  x  (k+1 1  k+1 )  is  generated  by  linearizing  about  the  point 
x  (klk).  After  the  entire  observation  sequence  (  z,(o),...  ,z_(n)j  has 
occurred,  the  smoothed  sequence  [  x  (k  |n)j ,  being  based  on  more  measurements, 
would  hopefully  be  a  better  trajectory  about  which  to  linearize  than 
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[x*(k|k)j  was.  One  can  then  use  a  modified  version  of  (2.77)  obtained 
by  linearizing  about  the  sequence  {  x  (k|n)j  to  reprocess  the  observations 
and  obtain  a  sequence  j  x^(klk)|  .  This  sequence  should  be  better  than 
[x  (klk)£  ,  since  the  linearization  took  place  along  a  better  trajectory. 
This  sequence  is  then  smoothed  using  modified  versions  of  (2.76)  and  (2.78) 
to  obtain  a  second  approximation  to  the  solution  {x^(kln)|0  The  procedure 
is  then  repeated  linearizing  about  {  x2(kln)j 0 

In  the  development  of  the  equations  for  this  technique  we  shall 
designate  the  sequence  {  x  (kln)j  by  {x..(k|n)j  to  emphasize  that  it  is  the 
first  approximation  to  the  solution .  Suppose  that  we  have  obtained  the 
ith  approximation  sequence  x.(k|n)  and  we  seek  approximation  i+1 .  Con~ 
sider  the  approximation  to  (2C1 )  given  by  the  following  linearization; 

x(k+l)^  f(3^(kln)fk)  +  (af(xi(k|n)tk)/-e>x)  [  xM-x^kln)  ] 

+  G(k)w(k)  (2.108) 

If  this  expression  is  substituted  into  (2.9)  a  modified  two=point  boundary 
value  problem  is  obtained; 

x^k+lln)  =  fC^CkliO.k)  +  (3f(x±(k|n),k)/ax)  [  x±+1  (klrO-x^kln)  ] 

+  G(k)£(k)G»(k)X(k)  (2.109) 

\(k~1)  =  (Bf'Cx^CklnJ.^/Bx)  A(k)  +  H«(k)Pf1(k)  [  z(k)-H(k)xi+1  (k|n)] 

(2.110) 
The  boundary  conditions  are 

X(n)  =  o  (2.111) 

xi+1(o|n)  =m  +  P(o)H'(o)R"1(o)  [  z (0)^(0)^  (o  In)] 

+  P(o)  (3r(Xi(o|n),o)/Sx)  X(o)  (2.112) 
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For  simplicity  we  first  assume  that  the  observations  are  linear 
and  that  R(k)  is  positive  definite,,  From  (2„109)  we  see  that  if  it  con- 
verges, the  successive  linearization  technique  converges  to  a  solution 
of  the  nonlinear  two~point  boundary  value  problem. 

The  boundary  condition  (2,112)  may  be  rewritten  in  the  following 
form; 

x^Coln)  =xi+1(o|o)  +C±(o)(3£Hxi(o\n),o)/Bx)   A(o)         (2,113) 

where  x   (o|o)  and  C.  (o)  are  x(o|o)  and  C(o)  respectively  as  given  by 
(2.39)  and  (2,40), 

As  usual,  we  assume  a  solution  of  the  following  form; 

xi+1(k|n)  =  SjL^Cklk)  +  C.(k)(3f « (^(kln^kj/a  x)  A(k)        (2,114) 

Substituting  (2,114)  into  (2,109)  yields 

x^  (k+1  In)  ^^(k+llk)  +  4(k+1)  ^(k)  (2,115) 

where  x       (k+1 1  k)  and  P.  (k+1 )  are  defined  as  follows; 

xi+1(k+1|k)  =  fQc^klnhk)  +  (3f(xi(k|n)Jk)/Sx) 

[%+1(k|k)  -  Xi(k|n)]  (2,116) 

P^k+1)  =  (3f(xi(k|n),k)/©x)  C^k)   (Bf>(x±(k\n),k)fex) 

+  G(k)£(k)G'(k)  (2,11?) 

Combining   (2,115)  and  (2,110)  yields 
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Xi+1  (k+1 1  n)  =  x^  (k+1 1  k)  +  ^(k+1  )H'  (k+1  )R~1  (k+1 )    [  z(k+1  )  -  H(k+1  )xi+1  (k+1 1  n)  ] 

+  P^k+lKBf'Cx^k+llrO.k+IJ/Bx)  \ (k+1 )  (20113) 

Solving   (2.118)  for  x..  (k+1  In)  we  obtain  the  following  relation  which 
satisfies  hypothesis   (2.114); 

x„  ^  (k+1 1  n)  =  x       (k+1  I  k+1 )  +  C.  (k+1 )  ("d  f '  (x,  (k+1  I  n)  ,k+1  )/3  x)  X(k+1 ) 
—1+1  "i+1  -3"  —   —i  —   — 

(2.119) 


where 


x       (k+1 1  k+1 )  =  x^  (k+1 1  k)  +  C^k+1  )H'  (k+1  )R~1  (k+1 ) 

[  z(k+1 )  -  H(k+1  )^+1  (k+1|  k)  ]  (2.120] 


and 
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C.  (k+1  )  =  [  I  +  P.  (k+1  )H'  (k+1  )R~1  (k+1  )H(k+1  )J  "     P.  (k+1 )  (2 .121  ) 

Jo  i  ■"■ 

We  have  now  derived  a  familiar  looking  set  of  equations.  The 
sequence  {x.   (klk)l  may  be  calculated  using  (2.116),  (201 17) »  (2.120) 
and  (2.121).  The  i+1  approximation  to  the  solution  f  x.+1  (k In)]  may  be 
obtained  using  (2.120)  and  (2.110). 

These  results  may  be  put  in  an  alternate  form;  combining  (2.110), 
(2.114)  and  (2.120)  yields 

X(k-1)  =  [I  -  H«(k)R°1(k)H(k)Ci(k)J  (Bf»(xi(k|n),k)/ax)  ^00 

+  [  I  -  !' (k)R°1 (k)H(k)C±(k)  ]  H» (k)R~1 (k)  [  z(k)~H(k)x±+1 (k |k»1 J 

(2  J  22) 
By  repeated  use  of  matrix  identities  this  equation  may  be  written  in  the 
following  form  corresponding  to  (2.57)? 
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X(k-1)  ={  I  -  H«(k)  [  Hdc^C^H'Ck)  +  R(k)]  ~1  H(k)P(k)  J 

Of'^klnJ.kJ/ax)  A(k)  +  H«(k)  [  HCk^CkjH*  (k)  +  R(k)J  ~1 

[z(k)  -  H(k)x(klk»1)]  (2.123) 

Combining  (2.116)  and  (2,120)  and  using  the  matrix  identity  (A. 2)  yields 

xi+1(k+1lk)  =  fCa^CkliO.k)  +  Of(x4(k|n)9k)/^x)  [2^+l(k|k-1)  -  ^(kln)] 

+  (Sf(xi(k|n),k)/©x)  ftM  H»(k)  [  H(k)P(k)H«  (k)  +  R(k)J  "1 

T  z(k)  -  H(k)xi+1  (k|k-1 )]  (2.124) 

We  may  also  write  the  following  equation  for  P.(k)  by  combining  (2.117) 
and  (2.121)  and  using  (2.58); 

P^k+1)  =  (af(xi(kln)9k)/3x)[  ^(k)  -  P^CkjH'Ck)  [  H(k)P(k)H*  (k)  +  R(k)]  " 

H(k)P(k)j(3f'(xi(k|n)9k)/Sx)  +  G(k)2(k)G»  (k)  (2.125) 

We  may  summarize  the  method  of  obtaining  the  i+1  approximation 
sequence  [x.+.  (k|n)]  from  the  previous  approximation  sequence  {  x^(k|n)  } 
in  the  following  three-step  procedure. 

1 e   Calculate  the  sequences  f x  . (k!k~1 )}  and  P. (k)  for 

k=o90.„en  using  (2.124)  and  (2.125)  respectively. 

The  procedure  begins  by  setting  x.+^ (o 1—1 )  and  P^(o) 

respectively  equal  to  m  and  P(o)  of  the  a  priori  dis- 

tribution  for  x(o)0 
2.   Calculate  the  sequence {  A(k)j  for  k=  ~10o.o,n  by  back= 

ward  recursion  of  (2.123)  using  the  boundary  condition 

X(n)  =  o. 


-7^ 


3.   Calculate  the  i+1  approximation  sequence  {  x.+.j  (k|  n)  j 
for  k=o,„099n  using  (20115)e 

The  same  technique  may  be  used  when  R(k)  is  singular  if  the  inverses 
appearing  in  (2„123)9  (2C124)  and  (2J25)  are  replaced  by  generalized  in=. 
verses „ 

The  method  may  easily  be  extended  to  include  the  complications  of 
nonlinear  observations,  correlation  between  the  random  input  w  and  the 
measurement  noise  v,  and  the  random  input  w  entering  nonlinearly0 
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CHAPTER  III 
ESTIMATION  OF  STATE  VARIABLES  FOR  CONTINUOUS-TIME  SYSTEMS 
g „ 1   Introduction 

The  mathematical  model  for  the  continuous-time  estimation  problem 
was  discussed  in  some  detail  in  Chapter  I0  The  explicit  problem  state- 
ment was  given  in  section  1  «A1 • 

The  continuous-time  estimation  problem  is  analogous  in  many  respects 
to  the  discrete- time  problem  of  Chapter  II „  In  the  continuous-time  prob- 
lem we  work  with  probability  density  functionals  instead  of  probability 
density  functions*  Specifically,  we  consider 


Px|z[s[to,  t  +  T]U[to.t] 


the  probability  density  functional  for  the  time  segment  of  x(t)  on  the 
interval  [t  ,  t  +  t]  given  the  observation  jz(t)  on  the  interval  ft 0»t]„ 
An  introductory  discussion  of  probability  density  functionals  is 
given  in  Appendix  Ec  For  the  purpose  of  this  chapter  it  will  be  suffi- 
cient to  note  that  the  summations  appearing  in  the  discrete- time  problem 
become  integrals  in  the  continuous=time  problem,, 

3o2   Preliminary  Considerations 

We  assign  a  Gaussian  a  priori  distribution  for  the  initial  state 
x(tQ)  with  mean  x(t0JtQ)  and  covariance  matrix  P(tQ),  It  will  be  con- 
venient to  assume  that  P(tQ)  is  positive  definite,  although  this  restrict 
tions  may  be  relaxed,, 

We  shall  use  t  to  designate  the  present  time,  tQ  to  designate 
the  time  observations  begin,  and  7'   as  a  general  time  parameter 0  The 
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estimate  of  x(^)  given  z_  a  £i  is  denoted  by  x(Y|t). 

In  order  to  estimate  xr.  .-i  when  G(t)2(t)G'  (t)  is  non-singular 
we  may  minimize  with  respect  to  xr.  . -i  the  following  functional  which 
is  the  continuous »time  analogue  of  (2.8) j 


2  ft  r  2 

+  *  J 


J(t)=i||2(t0)4(t0IVII  -1         +*J       lli(r)-h(x(r),Y)  || 

°   °  Y  (t0)       t0  I  r1(r) 


2  -| 

+  l|x(r)-f(x(r),T)  ||  „i       dr  (3.1) 

[GC£']        J 


Equivalently,  we  may  consider  minimizing  the  following  functional; 


Kt)  =iix(t0)4(t  it j  ii   1 

o  o   £-1(to) 


t  2  2 

+  i  f  jl|z(r)-h(x(r).r)  II  -   +||w(r)  ||  1   }dr 


(3.2) 


with  respect  to  w  r.   .-,  and  xr.  .-,   subject 
"Pot*]     -po*^ 


to  the  constraint 


x(y)  =  f(x(r),r)  +  GOr)w(Y)  (3.3) 

This  functional,  (3o2)  subject  to  the  constraint  (3»3)»  may  also  be  min- 
imized  when  G(t)£(t)G' (t)  is  singular. 

If  we  wish  to  estimate  xr.   t  +  Tl  0*2)  becomes 


*11* 


Kt,t  +  t)  =il|x<t0)4(t0iy  ||  .! 


t  2  2 

*  /  {l|z(r)-h(x(y),r)  II    <      +ll*(r)  II    - 
t;    I  r  (r)  £ 


(?) 


dr 


•  t  +  T 


iJ  SL    (T ) 


(3.*0 


That  is 


rt  +  T  2 

i(t,t  +  t)  =  i(t)  +i  /         ||  w(r)||     « 

tJ  £     (Y) 


dY 


(3.5) 


From  (3.5)  it  is  clear  that  I(t,t  +  T)  may  be  minimized  by  first  min- 
imizing I(t)  and  then  setting  w(f)  equal  to  zero  for  t  -  Y     -   t  +  T. 
This,  together  with  the  constraint  (3«>3)»  shows  that  predictions  are 
again  extrapolations  of  present  estimates „  We  may,  therefore,  confine 
our  attention  to  estimating  x^  ^-i. 

3»3   Dynamic  Programming  Formulation 

Note  that  because  of  the  constraint  (3e3)»  minimizing  (3»2)  with 
respect  to  wr,   .-,  and  x  r+  +-i  is  equivalent  to  first  minimizing  with 

respect  to  w  r,   ,-i  and  then  minimizing  with  respect  to  x(t).  This  leads 

"lt0,t|  - 

us  to  define  the  following  "cost"  function • 


V(x(t),t)  =  Min         <j  t  ||x(t)-*(tJt J   I  ^ 

°     °      P     (t  ) 


£[t0.t] 


t  2  2 

i    [      [lz(T)-h(x(Y),r)    1   .!  +Jw(Y)   ||    -1 

t{  R    (?)  q 


2 
I 

'g'VY 


d'i 


:3c6) 
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subject  to  the  constraint  (3<>3)o 

Note  that  minimizing  with  respect  to  wr+  +1  leaves  a  boundary 
condition  on  x  to  be  specified.  We  satisfy  this  boundary  condition  by 
specifying  the  final  point  x(t).  This  constitutes  a  departure  from  most 
dynamic  programming  problems  in  which  the  initial  condition  x(t  )  is 
specified.  Our  procedure  here  is  similar  to  that  used  in  case  i  of 
section  2„$<,   in  which  f~'  existed,,  No  similar  assumption  need  be  made 
here  because  integration  of  a  differential  equation  may  proceed  in  either 
direction,  whereas  there  is  a  definite  direction  associated  with  a  differ- 
ence  equation. 

We  may  give  V(x(t),t)  an  interpretation  similar  to  the  one  given 
to  V(x(n))  in  the  discrete-time  problem.  Let  £  be  any  particular  value 
of  x(t).  Then  V(c,t)  is  a  measure  of  the  unlikeliness  of  the  most  probable 
state  trajectory  xf    ,  in  which  x(t)  takes  on  the  particular  value  c, 
given  the  observation  z_,     and  the  a  priori  distribution  for  x(t  ), 
The  estimate  of  x(t)  is  that  value  of  x(t)  for  which  V(x(t),t)  is  minimum. 

Using  the  separability  property  we  may  rewrite  (3*6)  in  the  follow- 
ing form; 

V(x(t),t)  =  Min  J  V(x(t)-4  x,t~  At  )  +  i  f  '  \\]  z(T)-h(x(f),T)  ||      1 

^At9t][  *-At       L  s"  w 

2 

+  llw(r)  II    1  dr 

sr  (" ) 

subject  to  the  constraint  (3o3)o  Expanding  V(x(t),t)  in  a  Taylor  series, 
letting  At  approach  zero  and  using  the  constraint  (3,3),  we  obtain  the 
following  equation; 
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V.  =  Min  1   -  V«  [  f(x(t),t)  +  G(t)w(t)]   +   il|z(t)-h(x(t),t)  ||   1 

w(t)     £  R  (t) 

+  i  II  w(t)  ||  ^ 

A  (t)  J  (3.7) 

The  procedure  used  to  obtain  (3«7)  is  standard  in  the  dynamic 
programming  literature.  For  more  detailed  discussions  of  this  procedure 
and  its  relation  to  the  calculus  of  variations  see  [3,  5»  15.  16], 

The  w(t)  which  minimizes  (3*7)  is  easily  found  by  differentiation 
to  be 

w(t)  =£(t)G'(t)Vx  (3.8) 

Substituting  this  expression  for  w(t)  into  (3.7)  yields  the  following 
important  relation; 

2 
V    =  -  i  V»  G(t)£(t)G'(t)V    -  V     f(x(t),t)  +  i  ||  z(t)-h(x(t),t)  ||      1  , 
t  £  2E       2£  R  '  (t) 

(3o9) 
At  this  point  our  problem  is  twofold; 

i)   Find  a  function  V(x(t),t)  which  satisfies  (3»9)» 
ii)   Find  x(t  It).,  the  value  of  x(t)  for  which  V(x(t),t)  is  minimum,, 
A  boundary  condition  for  (3«9)  is  given  by  the  a  priori  distribution  for 

the  initial  state  x(t  ),  since  for  t  =  t  ,  (3„6)  becomes 

™    o  o 

V(x(t),t  )  =illx(t  )~x(t  |t  )  ||      1 

°°  000       p»1(to)  (3J0) 

Once  V(x(t),t)  has  been  found,  we  may  obtain  x(?lt)  for  t    -T    -  t 
by  backward  integration  of  the  following  equation  formed  by  combining 
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(3.3)  and   (3.8); 

ax(Ylt)/sr     =   f(x(r|t),r) +Q(r)a(T)G«(r)vx  (3.11) 

using  x(t  |t)  as  the  boundary  condition. 

There  is  no  difficulty  in  including  in  the  dynamic  programming 
formulation  systems  in  which  the  random  input  w  enters  nonlinearly.  For, 
example,  if  the  basic  system  is  described  by  the  following  equation; 

x(t)  =  f(x(t),t)  +  G(x(t),t)w(t)  (3.12) 

then  (3.8)  becomes 

w(t)  =£(t)G'(x(t),t)Vx  (3,13) 

and  (3*9)  becomes 

vt  =  "  *  Vx  G(x(t),t)£(t)G'(x(t),t)Vx  -  Vx  f(x(t),t) 

2 
+  i||z(t)-h(x(t),t)|[ 

R"1(t)  (3.1*) 

For  well  defined  systems  described  by  the  following  equation; 

x  =  f(x(t),w(t),t)  (3.15) 

(3«7)  becomes 

Vt  =  Min  )    -  V   f(x(t),w(t),t)  +  |||z(t)-h(x(t),t)  ||   . 

w(t)     -  R~1 (t) 


+  il|w(t)||    1 

2°1(t)j  (3.16) 
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and  (3.8)  becomes 

w(t)  =fi(t)(3f«(x(t)tw(t),t)/aw)V  (3.17) 

Equation  (3.1 6)  is  equivalent  to  two  equations:    (3.17)  and  the 
following  equation; 

2 
V+  =  -  V      f(x(t),w(t),t)  +  i||  z(t)-h(x(t),t)  ||      1 

+  i  II  s(t)  II    1 

2~1(t)  (3J8) 

We  shall  later  relate  (3.15)  through  (3.18)  to  the  two- point 
boundary  value  problem  obtained  by  Bryson  and  Frazier  [10J. 

3.4   Solution  to  the  Linear  Problem 

The  solution  of  the  linear  filtering  problem  discussed  in  section 
1.44  was  first  obtained  by  Kalman  and  Bucy  [34].  In  this  section  we  give 
a  simplified  derivation  of  this  solution  based  on  the  dynamic  programming 
formulation  of  the  problem.  In  order  to  simplify  the  resulting  equations 
it  will  be  convenient  to  omit  the  parameter  t  when  no  confusion  is  likely 
to  result. 

When  the  basic  system  is  linear  (3.9)  becomes 

2 

vt  =  '  *  V!  fi(t)a(t)2f(t)  vx  -  \    I(t)s(t)  +  ill  z(t)-H(t)x(t)  ||  _! 

£  -    -  R  (t) 

(3.19) 
It  is  well  known  [44,  45]  that  an  analytic  solution  may  be  obtained  for 
equations  of  this  type  by  expressing  V(x(t),t)  in  a  series  of  the  follow- 
ing form; 
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V(x(t),t)  ■  a(t)  +  b»(t)x(t)  +  i  x'(t)C(t)x(t) 

For  our  purpose  it  will  be  more  convenient  to  write  this  expression  in 

the  following  symmetric  form  in  which  the  minimum  of  the  function  is 

shown  explicitly? 

2 
V(x(t),t)  =  £||x(t)-x(t|t)  ||  ,         +k(t)  (3.20) 

P  (t) 


In  order  to  determine  x(t|t),  P~  (t)  and  k(t)  we  substitute  V 
and  V.  obtained  from  (3 .20)  into  (3.19)  and  equate  terms  of  like  degree 
in  x(t)0 

Differentiating  (3<>20)  we  obtain  the  following  equations  j 

V   =  P~1(t)  [x(t)-x(t|t)J  (3.21) 


x 


V^  =  4  x»  P 


1-1 

2    £~  P      a 


X'  P"1  +  X'  P 


x  +  d(i  x»  P"1x  +  k)/dt    (3.22) 


Substituting  (3.21)  and  (3.22)  into  (3.19)  and  equating  terms  of 
like  degree  in  x,  we  obtain  the  following  equations  after  some  algebraic 
manipulations  s 

P°1  =  .  P°1G^«  P~1  -  P"1F  -  F'P"1  +  H'R"1H  (3.23) 

dx(t|t)/dt  =  F(t)x(t|t)  +  P(t)H»(t)R~1(t)  [  z(t)-H(t)x(t|t)J    (3.24) 

0  °   1 

Using  the  identity  P  =  -  P  P~  P  we  may  write  (3.23)  in  the  following  form; 
£(t)  =  F(t)P(t)  +  P(t)F'(t)  -  P(t)H»(t)R-1(t®t)P(t) 

+  G(t)£(t)G«(t)  (3.25) 

The  final  form  of  the  optimal  linear  filter  is  given  by  (3.24) 
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Fig.  10   Optimal  linear  estimator 
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and  (3,25)  which  are  identical  to  the  equations  obtained  by  Kalman  and 

Bucy.  A  block  diagram  of  the  optimal  estimator  is  given  in  Fig.  10.  Here 

we  again  find  the  intuitively  appealing  feature  of  a  model  of  the  system 

appearing  in  a  feedback  loop. 

As  in  the  discrete- time  case,  P(t)  is  the  covariance  matrix  of  the 

probability  distribution  for  x(t),  given  the  observations 0  Equations  (3»24) 

and  (3o25)  remain  valid  when  P(t  )  is  singular  as  can  be  shown  by  taking 

the  limit  of  the  discrete-time  solution.  If  P(t  )  is  singular  the  dynamic 

—  o 

programming  formulation  breaks  down  because  trajectories  passing  through 
points  which  are  inconsistent  with  prior  information  become  infinitely 
more  unlikely  than  trajectories  which  do  not,  and  we  are  unable  to  reflect 
this  situation  in  a  scalar  "cost"  function,,  This  is  not  a  serious  drawback 

since  the  a  priori  distribution  may  be  chosen  to  approximate  the  state  of 

j 

prior  knowledge  arbitrarily  closely  without  introducing  a  singular  co- 
variance  matrix.  As  in  the  discrete-time  case,  this  difficulty  is  largely 
bypassed  if  GQG'  is  non=singular „  since  then  P(t  +  A  t)  is  positive  def= 
inite. 

3.5   Approximate  Solution  to  the  Nonlinear  Problem 

In  order  to  develop  an  approximate  solution  to  the  nonlinear  prob- 
lem we  follow  a  procedure  similar  to  the  one  used  in  discrete- time.  We 
may  rewrite  (3o9)  in  the  following  form; 

Vt  =  -  i  l|G'(t)Vx  ||    -  v'  f(x(t),t)  +  |  ||  z(t)~h(x*(t|t),t) 
-  2(t)    - 

-   h(x(t),t)~h(x*(t|t),t)  || 

r1oo  (3»26) 
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Since  the  estimate  of  x(t)  is  the  value  of  x(t)  for  which  V(x(t),t) 
is  minimum,,  we  would  expect  that  V(x(t),t)  could  be  closely  approximated 
in  a  neighborhood  of  this  minimum  by  a  quadratic  form  in  xe  Such  a 
quadratic  form  has  already  been  shown  to  be  the  solution  in  the  linear 
casee  We  shall  use  x  (t|t)  to  designate  the  estimate  produced  by  this 
approximation.  The  terms  on  the  right  hand  side  of  (3.26)  involving  f 
and  h  will  be  expanded  in  a  Taylor  series  about  the  point  x  (t|t).  The 
argument  t  should  be  understood  but  will  usually  not  be  written.  All 
partial  derivatives  are  to  be  evaluated  at  the  point  (x  (t|t),t). 
The  approximation  for  f  is  simply 


f(x(t),t)  *  f(x*)  +  Of/31)  [  x-x*  J  (3e27) 

The  term  involving  h  may  be  broken  down  into  the  following  sum  of  three 

terms; 

2  2 

£11  **(**)  II  ,  +  i|h(x)-h(x*)  ||    -  [h'(x)»h«(x*)]  R-1  [z-h(x*)] 

The  first  of  these  terms  is  independent  of  x„  The  second  term  is  approx- 
imated as  follows; 

2 
t  »h(x)-h(x*)  ||    _1  ~      i  [x-x*  j'      {3[(3h'/3x)r1h]  /axj  [x-x*] 

(3.28) 
The  approximation  for  the  third  term  is 

-[h'(x)-h»(x*)]  Rf1  [^(x*)]  *   -fM*]'  (3h»/ax)  R"1  [  z~h(x*)] 

(3.29) 
If  these  approximations  are  substituted  into  (3.26)  the  following  equation 
results; 
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Tt«.  i||  g«vx  ||      -    ^  {  f(x*)  +  (af/ax)    [x-x*]  ] 

2 
+  i  ||z-h(x*)  ||      ,+ifx-x*]*     {  2)[(ah'/3x)R°1h]  /SxJ   [x«x*J 
R~ 

-  [2B-S]1   (sh'/ax)R"1    [z-h(x*)]  (3o30) 

An  analytic  solution  of  (3<>30)  may  be  obtained  by  assuming  that  V(x(t)Dt) 
is  quadratic  in  x0  Specifically, 

V(x(t),t)  =  i  l|x(t)-x*(tlt)  ||  „  1    +  k(t)  (3.3D 

P   (t) 

It  is  straightforward,  but  algebraically  involved,  to  substitute 

into  (3,30)  the  V  and  V  obtained  from  (3,31)  to  obtain  the  following 
x     t 

equations  which  describe  the  estimator; 

dx*(t|t)/dt  =  f(x*(t|t),t)  +  P*(t)(ah'(x*(tlt)9t)/ax)R"1(t) 

[z(t)=h(x*(t|t),t)]  (3*32) 

P*(t)  =  (3f/3x)P*(t)  +P*(t)(af«/ax)  +  G(t)£(t)G'  (t) 

-P*(t)  {a[(3hc/ax)r1h]/axjp*(t)  (3.33) 

The  approximate  solution  to  the  problem  of  estimating  the  state 
variables  of  a  nonlinear  system  in  real  time  is  given  by  (3° 32)  and 

(3o33)»  For  linear  systems  these  equations  reduce  to  those  obtained  by 

*        * 
Kalman  and  Bucy„  The  initial  conditions  for  P  (t)  and  x  (t|t)  are  the 

P(tQ)  and  x(t0|t0)  g^ven  by  ^e  £  priori  distribution  for  the  initial 

state  x(tQ). 

A  block  diagram  of  the  nonlinear  estimator  in  a  control  system 
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Fig„  11   Nonlinear  estimator  in  a  control  system 
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is  given  in  Fig»  1 1 .  In  the  diagram  the  control  vector  u  is  shown  ex- 
plicitly o  Again  we  find  the  intuitively  appealing  design  which  places 
a  model  of  the  basic  system  in  a  feedback  loop» 

If  the  random  input  w  enters  the  system  nonlinearly  we  may  retain 
the  basic  structure  of  the  estimator  if  we  replace  G(t)  by  G(x  (t!t)„t) 
or  (3f(x  (t|t)90J>)/aw)  »-  whichever  is  appropriate. 

A  first  approximation  to  the  solution  of  the  problem  of  estimating 
x(7')  for  t  -  f   -  t  is  given  by  x  (1  |t)  which  can  be  obtained  by  backward 
integration  of  the  following  equation  using  x  (tit)  as  the  boundary 
condition^ 

3x*(Y|t)/aV   =  f(x*(tlt)9l)  +G(T)£(Y)G»(Y)Vy 

5  (3.3*) 

306   A  Two- Point  Boundary  Value  Problem 

The  two- point  boundary  value  problem  obtained  by  Bryson  and 
Frazier  [10J  using  the  calculus  of  variations  follows  directly  from  the 
dynamic  programming  point  of  view0  We  shall  consider  the  most  general 
case  in  which  the  noise  enters  nonlinearly  as  described  by  (3«15)  through 
(3o18)0  The  procedure  we  follow  is  based  on  the  one  used  by  Dreyfus  [16] 
in  his  study  of  the  problem  of  Mayer  in  the  calculus  of  variations  „ 

In  order  to  simplify  the  resulting  expressions  the  arguments  x9 
w  and  t  usually  will  be  omitted  when  no  confusion  is  likely  to  result „ 
The  following  relation  follows  from  the  rules  of  differentiation 5 


dVx/dt  =  3V  /at  +  (av/-3x)x 


Differentiating  (3d 8)  with  respect  to  x  yields 


(3o35) 
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aVx/5t  =  -(sf'/sx)Vx-  (©Vx/sx)x 

-(3h'/3x)f1  [z-h(x,t)] 

-(3W'/3X)  [  (3f«/3H)  V   -  £"1W  ]  (3o36) 

Noting  that  by  (3 J 7)  the  coefficient  of  (3w'/ax)  in  (3.36)  is  zero, 
we  may  combine  (3<>35)  and  (3.36)  to  obtain  the  following  equation; 

dVx/dt  =  -  (3f«/3x)Vx  -  (3h'/ax)R"1  [z-h(x.t)]        (3.37) 

This  equation9  together  with  (3.15)  and  (3.17 )»  are  the  basic  relations 
for  the  two-point  boundary  value  problem.  At  the  final  time  t  we  have 
the  following  natural  boundary  condition  which  corresponds  to  the 
requirement  that  x(t|t)  is  the  value  of  x(t)  for  which  V(x(t),t)  is 
minimum; 

V  (t)  =  0  (3.38) 

The  boundary  condition  at  the  initial  time  t  is  obtained  by  differentia- 
ting (3.10).  This  yields 

x(tjt)  =  x(tQ|to)  +  P(t0)Vx  (3.39) 

The  quantity  V  plays  the  role  of  the  Lagrange  multiplier  of  the 
calculus  of  variations  [15»  16],  If  we  let  V  =  ^(t),  then  the  two-point 
boundary  value  problem  may  be  written  in  the  following  form  which  closely 
resembles  that  obtained  by  Bryson  and  Frazier; 

3x(?it)/3r  =  f(x(f|t)tw(v)fr)         t0^r  it    OM) 

w(T)  =  ^)(3I'/SH)^)  t0^r^t     (3.41) 


£<f)  =  -  (af«/sx)  X(r)  -  (&h«/ax)  R~1  <tO  [  z(r)  -  h(x(7|t)tr)] 

The  boundary  conditions  are 

X(t)  -  o  (3.43) 

x(t  It)  =  x(t  It  )  +  P(t  )  X(t  )  (3.44) 

—   q       ""00        0      0 

For  the  important  special  cases  in  which  the  random  input  w  enters 
linearly,  the  equations  for  x(?lt)  and  \(i)   become 

BxCnt)/^      =  f(x(rit),r) +G(r)s»(r)G'(r)  2(r)  (3.45) 

iii)  =-  (3f»/3i)  XCr)  -  (3h'/?x)  r"V>  [  a(y)  -h(i(nt),r)] 

(3.46) 

We  shall  have  more  to  say  about  the  two-point  boundary  value 
problem  later  when  we  discuss  the  relationship  between  the  estimation 
problem  and  control  problems,, 


In  certain  applications  observations  of  a  continuous-time  system 
can  conveniently  be  made  at  discrete  points  in  time.  We  shall  discuss 
this  situation  only  briefly,,  since  it  requires  only  minor  modifications 
of  our  previous  results  0 

Consider  the  system 

x(t)  =  f (x(t)9t)  +  G(t)w(t)  (3.47) 

Suppose  the  observations  are  made  at  equally  spaced  intervals  of  length 
To  The  observed  signal  is 


.91- 


z(kT)  =  h(x(kT),kT)  +  v(kT)  (3c^8) 

We  again  assume  that  the  random  input  w(t)  and  the  measurement  noise  v(kT) 
are  Gaussian  with  zero  meanc  The  co variance  matrices  for  v  and  w  are  as 
follows ; 

E[w(t)w«(T)]   =  fi(t)  cT(t-T) 

E[v(kT)v«(jT)]  =  R(kT)    S.k 
E[w(t)v'(kT)]  =  0 

If  we  are  between  observations  n  and  n+1  (nT  -  t  ^  (n+1 )  T),  then  by- 
analogy  to  previous  work,  we  may  minimize  the  following  functional; 

2         t  2 

Kt)  =  i|x(o)-x(o|o-)||   .    +  f         iIw(V)||  -     dr 

n  2 

+  Yl     ill  z(kT)-h(x(kT)fkT)  ||   1     nT  ^  t  <  (n+1  )T 
k=o  R~  (kT) 

subject  to  the  constraint  (3^7  )<> 

Here  we  have  assumed  for  convenience  that  observations  begin  at 
t=o„  Note  that  because  of  the  sampled  nature  of  the  observations,  the 
estimate  will  be  discontinuous  at  the  sampling  times 0  We  use  x(o|o")  to 
designate  the  mean  of  the  a  priori  distribution  for  the  initial  state „ 
This  corresponds  to  m  of  the  discrete-time  problem,,  Similarly,  x(o|o  ) 

A 

would  correspond  to  x(o|o)  of  the  discrete-time  problem. 

Using  a  standard  engineering  technique,  we  let  i(t)  be  a  train 
of  unit  impulses  with  the  impulses  occurring  at  the  sampling  times  kT. 
Then  (3.^9)  may  be  rewritten  in  the  following  form  which  closely  resembles 
(3.2); 
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O  4.  O 

:(t)  =  i||x(o)4(o[o-)I  .   +  /  {  i(r)||z(f)^(x(y),r)  II  . 


R-'M 


2     , 
+  II  v(/)ll   „     df 


AT  W)J  (3*50) 

subject  to  the  constrant  (3.^7).  Proceeding  formally 


2 


Vt  -  -  iV.Ggg'V  -V    f(x(t),t)  +i  i(t)IU(t)~h(x(t)9t)|| 

(3.51) 
Note  that  V+  is  discontinuous  at  the  sampling  times  so  that  the  limit 

used  to  obtain  (3.51 )  by  analogy  to  (3*7)  does  not  exist  at  the  sampling 

instants „  Proceeding  in  a  purely  formal  manner,  we  now  use  the  same 

approximation  technique  that  was  employed  in  section  3.5  to  obtain  the 

following  equations; 

dx*(tlt)/dt  =  f(x*(tlt),t)  +  i(t)  P*(t+)  (^h»/Bx)r1  [  z(t)-h(x*(tlt),t)J 

(3.52) 
P*-1  =  .  P*~1(^f/3x)  -  (3f»/^x)  P*~1  -  P*"1  GgG'  P*-1, 

+  i(t)  {  3[(3h»/3x)  R°1  h]  /sx]  (3.53) 

At  the  sampling  instants  the  identity  P  °  =.  PP°  P  breaks  down0 
Between  the  sampling  times  we  have 

P*(t)  =  (3f/3x)  P*(t)  +  P*(t)  (3f»/3x)  +G(t)£(t)G'(t) 

for  kT  <  t  <   (k+l)T    (3.5^) 
At  the  sampling  ingtants  we  have  the  following  formula  resembling  the 
results  of  the  discrete- time  problem; 

P*(kT+)  =  [i  +  P(WT)  {  3[(3h'/ax)  R=1  h]/<Dx]]"1  P(kT~)    (3.55) 

Note  that  the  values  of  P*(kT+)  and  ("3  h/3x)  are  only 
needed  at  the  sampling  times 0   The  estimator  could,  therefore 
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consist  of  an  analogue  model  of  the  basic  system  in  conjunction  with  a 
special  purpose  digital  computer „  Such  a  hybrid  scheme  would  simultaneously 
avoid  the  difficulties  of  approximating  the  nonlinear  system  on  a  digital 
computer  and  performing  a  large  number  of  multiplications  on  an  analogue 
computer o 

^ 08   Relation  Between  the  Estimation  Problem  and  Control  Problems 

The  fact  that  the  solution  of  the  linear  filtering  problem 
is  formally  the  same  as  the  solution  of  the  problem  of  controlling 
a  linear  system  with  quadratic  performance  criteria  was  first  pointed 
out  by  Kalman  [31]  and  the  existence  of  this  duality  is  by  now  well 
known0  Since  this  duality  has  received  ample  coverage  in  the  literature 
[259  29 ,  31 ,  32 „  33 o  3^]  we  shall  not  dwell  on  this  point  except  to 
mention  that  the  nature  of  this  duality  is  made  especially  clear  by  the 
present  formulation  of  the  estimation  problem  in  which  the  estimation 
procedure  involves  the  minimization  of  a  quadratic  "cost"  functional e 

The  close  connection  between  the  observability  of  a  system  and 
the  controllability  of  the  adjoint  system  will  be  discussed  in  Chapter  IV0 

It  should  be  clear  from  the  present  formulation  that  there  is  a 
close  connection  between  the  nonlinear  estimation  problem  and  certain 
nonlinear  control  problems 8  For  example „  consider  the  following  system ° 


x(t)  =  f(x(t),t)  +  G(t)u 
Z(t)  =  h(x(t)9t) 

Suppose  that  we  want  the  output  y_(t)  to  follow  a  known  function  z(t) 
and  agree  to  choose  the  control  u  r    -,  to  minimize  the  following  per* 
formance  functional; 


.9^ 


i       f     fl«(t)  -h(x(t),t)||   1    +  ||u(t)||   . 

W   L  re*)         Vet) 


dt 


This  functional  is  a  weighted  combination  of  a  quadratic  measure  of  the 
error  and  the  total  "energy"  expended „ 

Various  methods  may  be  used  to  attack  this  problem 0  We  shall 
find  it  convenient  to  use  the  Hamiltonian  point  of  view,  so  much  in  vogue 
in  the  current  control  literature „  Since  this  section  is  an  aside,  we 
shall  assume  that  the  reader  is  familiar  with  the  basic  concepts c  For 
extensive  treatment  of  these  and  related  matters,  the  reader  is  referred 
to  [3,  5,  15,  16,  30,  52,  55,   61]. 

Let  us  define  the  following  Lagrangian; 

2  2 

L(x(t),u(t),t)  =  |  ||z(t)  -  h(x(t),t)  ||      +l|u(t)  || 

R-1(t)       £~1(t) 

Using  the  "Maximum  Principle",  we  consider  the  following  Hamiltonian; 

H(x(t),£(t),t)  =  Max   J  £i(t)  [  f(x(t),t)  +  G(t)u(t)]  -  L(x(t),u(t),t) 
u(t) 

The  maximizing  value  of  u(t)  is  easily  obtained  by  differentiation 

u*(t)  =fi(t)G»(t)£(t) 

Substituting  this  value  into  the  expression  for  H  yields 

2  2 

H(x(t),E(t),t)  =2«(t)f(x(t),t)  +il|G«(t)£(t)||    -  i||z(t)=h(x(t),t)  ||   1 

fi  H  (t) 

The  canonical  equations  for  this  problem, 
x(t)  =  3H/ap_ 

£(t)  =  -  aH/ax 
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are  the  same  as  (3,^5)  and  (3«^6)  if  we  equate  p_(t)  and  >.(t).  Further, 
we  can  easily  recognize  (3<>9)  as  the  Hamilton-Jacobi  partial  differential 
equation  [15,  16,  30J  i 

Vt  +  H(x(t),V  ,t)  =  o 

We  can  make  the  boundary  conditions  the  same  as  those  for  the  estimation 
problem  by  leaving  the  final  state  x(T)  unspecified  and  by  assigning  a 
quadratic  "cost"  II  x(t  )  -  x  (t  )  i  for  starting  in  a  state  other 

r1(t0) 

than  a  nominal  initial  state  x  (t  )„  Specifying  the  initial  state  would 
correspond  to  the  smoothing  problem  in  which  the  initial  state  were  known 
exactly;  that  is9  P(t  )  =  0o 

The  regulator  problem  z(t)  =  o  corresponds  to  a  smoothing  problem 
in  which  the  observed  value  of  signal  plus  noise  is  equal  to  zero  on 
[t0„T]o  While  such  a  set  of  observations  is  extremely  unlikely,  it  is, 
nevertheless ,  the  most  likely  curve  z  r.   m-i  for  a  stable  system  with  the 
mean  of  the  a  priori  distribution  for  x(t>)  equal  to  zeroc 

While  this  formulation  of  the  control  problem  is  somewhat  arti- 
ficial since,  except  for  the  regulator  problem,  the  desired  output  is 
usually  not  known  in  advance,  the  point  we  wish  to  make  is  that  the 
problem  of  finding  the  minimum  energy  control  so  that  the  output  of  the 
system  follows  a  desired  trajectory  using  a  quadratic  performance  criteria 
is  mathematically  identical  with  the  problem  of  finding  the  most  likely 
random  input  after  making  noisy  observations  of  the  output „ 
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CHAPTER  IV 
OBSERVABILITY  AND  CONTROLLABILITY 
4  0 1   Introduction 

The  purpose  of  this  chapter  is  to  give  a  brief  introduction  to  the 
concepts  of  observability  and  controllability  and  to  relate  these  ideas 
to  the  present  formulation  of  the  estimation  problem.  These  concepts 
were  introduced  by  Kalman  [32,  3^]  and  have  been  studied  by  him  [33]  and 
his  associates  [ 70] „     The  paper  by  Bertram  and  Sarachik  represents  an 
early  contribution^ [69]  » 

We  shall  confine  our  attention  to  discrete- time  systems  since 
certain  problems  arise  in  discrete- time  systems  which  do  not  exist  in 
their  continuous-time  counterparts  and  since  the  results  are  easily  ex- 
tended to  continuous-time  systems  which  have  been  studied  more  extensively 
in  the  recent  literature  [33 0  70] „ 

The  definitions  of  controllability  and  observability  which  we 
shall  introduce  will  differ  slightly  from  those  given  elsewhere c  The 
modified  definitions  will  enable  us  to  include  linear  systems  with  singular 
transition  matrices  and  —  in  principle  —  nonlinear  systems  0  They  will 
also  simplify  the  development  of  the  relation  between  observability  of 
a  system  and  controllability  of  the  adjoint  system  which  is  of  particular 
significance  in  the  estimation  problem,, 

The  material  in  section  B6  of  Appendix  B  will  be  continually  re- 
ferred to  and  provides  necessary  background  for  this  chapter. 


Consider  the  following  linear  discrete-time  system; 
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x(k+1 )  =  F(k)x(k)  +  G(k)u(k)  (4J ) 

Z(k)  =  H(k)x(k)  (4„2) 

In  section  1,3  we  showed  that  the  forward  transition  matrix  for  this 
system  may  be  expressed  in  the  following  form; 

I(m+1,k+1)  =  F(m)F(m-1)  "   •  •  F(k+1 )  m  >  k         (403) 

The  adjoint  system  of  (4J)  and  (402)  is  defined  to  be 

x(k«1 )  =  F»  (k)x(k)  +  H'  (k)u(k)  (4.4) 

y_(k)  =  G»(k)x(k)  (4.5) 

The  backward  transition  matrix  for  the  adjoint  system  is 

£(k,m)  =  F»(k+1)  •  •  •  F1 (m~1 )F8 (m)  m  >  k         (4.6) 

Comparing  (4.3)  and  (4.6)  we  obtain  the  following  important  identity; 

I(n+1,k+1)  =  *c(k,m)  (4.7) 

The  relation  between  a  system  and  its  adjoint  will  be  discussed  in  some 
detail  in  the  sequel . 


The  question  of  observability  has  to  do  with  the  basic  structure 
of  the  system  under  consideration .  Roughly,  the  question  is  this;  "Given 
a  finite  sequence  of  exact  measurements  of  the  output  of  an  unforced  system, 
can  one  uniquely  reconstruct  the  corresponding  state  sequence?"  If  the 
answer  is  yes„  the  system  will  be  called  completely  observable  on  the 
sequence  of  times  during  which  the  measurements  are  made0 
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In  order  to  develop  some  insight  we  shall  first  examine  the 
question  of  observability  for  a  time-invariant  discrete-time  system 
before  giving  a  more  precise  definition  of  observability  and  considering 
its  ramifications.  Consider  the  following  system; 


x(k+1  )  =  F  x(k)  +  G  u(k) 


£(k)  =  H  x(k) 


(4.8) 


(4.9) 


with  the  state  vector  x,  and  n- vector 9  and  the  output  £.  If  the  system 
is  unforced^u^E  o.  Suppose  that  we  make  n  consecutive  measurements  of 
the  output  y_«  For  convenience  we  assume  that  the  first  measurement 
occurs  at  time  zero.  Clearly,  this  involves  no  loss  of  generality  in  the 
time-invariant  case.  We  may  then  write  the  following  equations  relating 
the  output  sequence  [ ^(o)9(>o<.  »£(n-1  )  \  and  the  initial  state  x(o); 

£(o)  =  H  x(o) 
ZO)   =  H  F  x(o) 


£(n=1 )  =  H  Fn=1  x(o) 


or,  equivalently 

Z(o) 
Z(D 


H 

H  F 

o 

o 

9 

H  F11^ 

5(o) 


Z(n~1 ) 


We  wish  to  know  whether  or  not  the  initial  state  x(o)  can  be  determined 


(4.10) 
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uniquely  as  a  result  of  the  measurements  of  the  output,,  Once  the  initial 
state  x(o)  is  known,  the  sequence  of  states  [x(k)\    (for  k  -  o)  of  the  un- 
forced system  is  given  by  (4.8) „  We  implicitly  assume  that  (4.10)  has  a 
solution;  that  is  the  output  sequence  does  not  contradict  the  model  of 
(4.8)  and  (4.9).  From  elementary  matrix  theory  we  know  that  x(o)  can  be 
determined  uniquely  if  and  only  if  the  matrix  on  the  right  side  of  (4.10) 
or,  equivalently,  its  transpose  [h5,  F'H"  ,«„ «.  ,F!  ~  H!]  has  rank  n.  More- 
over, if  the  initial  state  of  the  time-invariant  system  cannot  be  deter- 
mined  after  n  observations  of  the  output,  additional  observations  will 
convey  no  new  information  because,  by  the  Cayley-Hamilton  theorem  [17], 
any  positive  integer  power  of  an  n  by  n  matrix  F  may  be  expressed  as  a 
linear  combination  of  I,  F,.,,,  F31   and,  therefore,  additional  observations 
are  simply  linear  combinations  of  the  first  n  observations 0 

In  devising  a  workable  definition  of  observability  for  discrete- 
time  systems  we  are  faced  with  a  problem  that  does  not  arise  in  continuous- 
time  systems,  namely  that  a  discrete-time  system  may  be  described  by  a 
forward  difference  equation,  for  example  (4.8);  or  by  a  backward  differ- 
ence equation,  for  example  (2e3^)» 

Consider  the  forward  system 

x(k+1)  =  f(x(k),k)  (4.11) 

and  the  backward  system 

x(k-l)  =  f(x(k),k)  (4.12) 

the  output  of  both  systems  being  given  by  an  equation  of  the  following  form 5 

Z(k)  =  h(x(k),k)  (4.13) 
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In  order  to  include  both  types  of  systems,  we  propose  the  following 

definition  of  observability. 

Definition ;  A  discrete- time  system  will  be  called  completely  observable 

on  a  set  of  times  (j,  j+1  , ...,m~1 ,mj  if,  by  observing  the  output  sequence 

{y.(j)»<>oo,y„(m)  j  ,  one  can  uniquely  determine  the  state  sequence  [x(j)9 ...  ,x(m)j 

for  all  possible  state  sequences . 

It  is  clear  that  for  the  forward  system  (4.11)  and  (4.13)  it  is 
necessary  and  sufficient  to  determine  x(j)  an(i  f°r  the  backward  system 
(4.12)  and  (4.13)  it  is  necessary  and  sufficient  to  determine  x(m). 

4.4   Definition  of  Controllability 

The  question  of  controllability,  like  the  question  of  observability, 
has  to  do  with  the  basic  structure  of  the  system  under  consideration. 
Roughly,  the  question  is  this 5  "Given  any  two  states  X.  an(i  2b »  can  one 
choose  a  finite  input  sequence  in  such  a  manner  that  the  corresponding 
state  sequence  will  be  of  the  form  [  X| ,... jXg] ?"  If  the  answer  is  yes, 
the  system  will  be  called  completely  controllable  on  the  set  of  times 
during  which  the  state  sequence  occurs. 

In  order  to  develop  some  insight  we  shall  again  consider  the 
question  of  controllability  for  a  time=» invariant  discrete-time  system 
before  giving  a  more  precise  definition  of  controllability.  Consider 
the  following  system 5 

x(k+1 )  =  F  x(k)  +  G  u(k)  (4.14) 

with  the  state  vector  x,  an  n- vector,  and  the  control  input,  u,  Suppose 
that  we  wish  to  move  the  system  from  a  known  initial  state  x(o)  to  a 
desired  state  x^  at  time  n.  We  may  then  write  the  following  equations 
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relating  the  state  sequence  and  the  control  sequence; 

x(0  =  F  x(o)  +  G  u(o) 

x(2)  =  F2  x(o)  +  F  G  u(o)  +  G  u(1 ) 


x,(n)  =  F/1  x(o)  +  F^1  G  u(o)  +  •  •  *  +  Q  u(n-1  ) 


We  may  rewrite  this  last  equation  in  the  following  form; 


(M5) 


x,(n)  -  f1  x(o)  =  [  f^   G,000,  FG,  G] 


^d 


u(o) 


u(n»1  ) 


(^.16) 


We  see  that  a  solution  of  (4.16)  will  exist  if  and  only  if  x^n)  -  f   x(o) 
is  a  linear  combination  of  the  column  vectors  of  the  matrix  [F11"  G,,0,,  F  G,G]; 
in  which  case  x,(n)  -  F11  x(o)  belongs  to  the  range  of  that  matrix.  If  the 
matrix  [  Fj1"  G,.,,,  F  G„G]  has  rank  n,  then  any  final  state  may  be  reached 
from  any  initial  state  in  n  transitions e  Moreover „  by  the  Cayley-Hamilton 
theorem,  any  state  which  can  be  reached  from  x(o)  can  be  reached  in  (at 
most)  n  transitions o  For  the  special  case  of  a  single  input  systems,  the 
matrix  G  becomes  a  vector  &  and  we  obtain  the  results  that  every  final 
state  may  be  reached  from  every  initial  state  if  and  only  if  the  matrix 
[J;  ~  £,eO0!>  F  g_,  g.]  is  non-singular  and  that  any  state  which  can  be 
reached,  can  be  reached  in  (at  most)  n  transitions 0  These  results  are  well 
known  [32,  69]  and  are  the  basis  for  the  design  of  "dead  beat"  systems. 

In  devising  a  workable  definition  of  controllability  we  are  again 
faced  with  the  problem  of  forward  and  backward  difference  equations.  Consider 
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the  forward  system 5 

x(k+1)  =  f(x(k),  u(k),k)  (4.17) 

and  the  backward  system? 

x(k»1)  =  f(x(k),  u(k),k)  (4.18) 

In  order  to  include  both  types  of  systems „  we  propose  the  following  defini- 
tion of  controllability . 

Definition  %       A  discrete- time  system  will  be  called  completley  controllable 
on  a  set  of  times  [j„  j+1 900. ,  ro-1 „m]  if  for  every  x.  and  x~  a  control 
sequence  [  u(j)9...  „u(m)  ]   exists  such  that  x(j)  =  x..  and  x(m)  =  Xo» 

It  is  clear  that  for  the  forward  system  (4.17)  only  the  sequence 
{  u( J )  •  •  °  °  »3i(m~''  )  1  ^s  pertinent  and  for  the  backward  system  (4.18)  only 
the  sequence  { u( j+1 ) , . . „ 9u(m) }  is  pertinent. 

4.5   Controllability  and  Observability  for  Linear  Pis crete- Time  Systems 

The  questions  of  observability  and  controllability  for  time-varying 
linear  discrete-time  systems  are  closely  related  to  those  for  the  time- 
invariant  systems  already  discussed,.  Let  us  consider  the  system  of  (4.1) 
and  (4.2)  which  we  rewrite  for  convenience 

x(k+1  )  =  F(k)x(k)  +  G(k)u(k)  (4.1 ) 

l(k)  =  H(k)x(k)  (4.2) 

If  we  observe  the  output  sequence  [  £(j),88,  vZW]   for  the  unforced  system,, 
u  5  o9  then  we  may  write  the  following  equation  relating  the  output  sequence 
[Z(j)»«oo  ^(m)  ]  to  the  initial  state  x(j); 
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zti) 

Z(  j+1 ) 


Z(m) 


H(j) 

H(j+1)  I(j+1,j) 


H(*-*>0  I(m«j; 


x(3) 


<M9) 


Since  this  is  just  a  special  case  of  the  equation  Ax  =  £  which  is  con- 
sidered in  Appendix  B,  we  may  apply  lemma  B601  to  obtain  the  following 
result. 

Lemma  4„51  «   The  system  (4e1)  and  (4C2)  is  completely  observable  on  the 
set  of  times  {j,j+1  ,oo<,8m-1  0m ]  if  and  only  if  the  following  matrix  is 
positive  definite; 


m 


Mf(j,m)  =  Yl  I'(k9j)H»(k)H(k)t(k0j) 


k=j 


(4.20) 


If  Ep(j„m)  is  positive  definite 9  then  the  unique  solution  for  x(j)  is 


m 


x(j)  =  Mf°1a^)    f^i'dc.j^dc^k) 


(4.21 ) 


From  (4„21 )  we  see  that  it  is  not  necessary  to  know  the  sequence 
{£(j)»*eo,y.(m)J  but  merely  the  following  sum; 


in 


I]  I'(k„j)H'(k)v.(k) 
k=j 


Note 


that  if  M(j„m)  is  positive  definite,  then  so  is  M»(j,m+k)  for  k  >  o, 
Let  us  now  consider  the  question  of  controllability  of  the  same 
system,  (401)0  If  we  consider  the  control  sequence  [  u(j)sooo ,u(m»1 )j , 
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then  we  may  write  the  following  equation  relating  the  states  x(j)  and 
x(m)  and  the  control  sequence 


x(m)  -  £(m,j)x(j)  =  [i(mJ+1)G(j)M.MG(m-l)] 


u(j) 


u(m-1  ) 


(4.22) 


Again  this  is  just  a  special  case  of  the  equation  Ax  =  y_  and  we  immediately 
obtain  the  following  lemmas  as  special  cases  of  lemmas  B6„2  and  B6.3  re- 
spectivelye 

LemmaJ+2^2;  The  state  x(m)  may  be  reached  at  time  m  from  the  state  x(j)  at 
time  j  if  and  only  if  x(m)  =  I(m9 j)x(j)  belongs  to  the  range  of  the  follow- 
ing matrix j 


m-1 


Wf(j9m)  =  V]   l(ml)k+1)G(k)G,(k)|>»(m()k+1) 


k=J 


(4o23) 


In  this  case,  the  solution  which  minimizes 


m-1         2 

£   i||s(k)l| 

k=j 


is 


#/. 


u(k)  =  G«(k)l1(m,k+1)w/(;),m)  [  x(m)  -  I(m,  j)x(j)]    k=j ....  ,nt-1  (4.24) 


Lemma  4.^;  The  system  (4.1 )  is  completely  controllable  on  the  set  of  times 
{j.J+1 ,0.0, m-1  ,m]  if  and  only  if  W^-Cj^m)  is  positive  definite.  In  this  case 
the  solution  which  minimizes 
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xs 


m-1  2 

Z]   HbOO  II 

k=0 


u(k)  =  G'(k)I,(m,k+1)Wf"1(j,m)  [  x(m)  -  I(m,j)x(j)]   k=j , . . . ,m-1  (4C25) 


Let  us  now  consider  a  system  described  by  a  backward  difference 
equation,,  In  particular,  let  us  consider  the  adjoint  system  given  by  (4.^) 
and  (405)  which  we  rewrite  for  convenience 


x(k-1)  =  F'(k)x(k)  +H,(k)u(k) 


l(k)  =  G'(k)x(k) 


<*.*) 


<*.5) 


In  order  to  examine  the  question  of  observability  we  write  the  following 
equation  relating  the  output  sequence  {yp(j),o..,^(m)j  and  the  state  x(m) 
for  the  unforced  system; 


Z(j) 


Z(n-1 ; 


G,(J)f(J.n) 


G«  (m-1 )  *(m~1  ,m) 
G'(m) 


x(m) 


(^o26) 


Again  we  note  that  (4026)  is  .just  a  special  case  of  the  equation  Ax  =  y 
and  we  immediately  obtain  the  following  lemma  as  a  special  case  of  B6tt10 

Lemma;  k0^i     The  system  (^o^)  and  (405)  is  completely  observable  on  the 
set  of  times  {  j , ,  0  0  „m  j  if  and  only  if  the  following  matrix  is  positive 
definite ; 
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m 


MbQ,m)  =  J]  *«(k,m)G(k)G'(k)S(k,m) 
k=j 


(4.27) 


If  M  (j9m)  is  positive  definite  the  unique  solution  for  x(m)  is 


m 


x(m)  =  Mb"1  ( j  9m)   V  f » (k,m)G(k)y.(k) 


(4.28) 


Using  the  identity  (4.7)  we  may  write   (4.27)  as  follows; 


m 


M.  (j,m)  =       V  I(m+1  ,k+1  )G(k)G»  (k)S'  (m+1  ,k+1 )  =  W.(j,m+1 ) 

"*  k^~  ~      "        "  "*  (4.29) 


A  comparison  of  (4.23)  and  (4.29)  yields  the  following  theorem. 

Theorem  4.55;  The  system  (4.1 )  and  (4.2)  is  completely  controllable  on 
the  set  of  times  {  j,...,m+1 \   if  and  only  if  the  adjoint  system  (4.4)  and 
(4.5)  is  completely  observable  on  the  set  of  times  { j9...9m}. 


Let  us  now  examine  the  question  of  controllability  of  the  adjoint 
system.  We  consider  the  control  sequence  [  u(j+1  )<>... 9u(m)j  and  obtain 
the  following  equation  relating  the  states  x(j)  and  x(m)  and  the  control 
sequence  j 


x(j)  -  I(j,m)x(m)  =  [H'(j+1),.„.,f(j,m-2)H»(m-l),t(j,m-1)H,(m)] 


u(  j+1  ) 


u(m-1 ) 
u(m) 


(4.30) 
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Again  this  is  just  a  special  case  of  the  equation  Ax  =  y_  and  we  obtain 
the  following  lemma  as  a  special  case  of  b6.3» 

Lemma  4.56;  The  system  (4.5)  is  completely  controllable  on  the  set  of 
times  {j,j+1  .....m-ljnjif  and  only  if  the  following  matrix  is  positive 
definite ; 


m 
W,  ( J  ,m)  =    YL       I<3  'k"1  >!'  (k)H(k)!'  (J ,  k-1 ) 


If  Wu(j,m)  is  positive  definite  the  solution  which  minimizes 


m  2 

£     i  lack)  l 

k=j+1 


is 


u(k)  =  H(k)l'(j9k-1)Wb~1(j,m)  [x(j)  -  £(j,m)x(m)]  k=j+1,„..,m 

o  o  o  \* e j£ ) 

Using  the  identity  (4.7)  we  may  rewrite  (4.31 )  in  the  following 


form; 


m 
Wb(j»m)  =       £     rOc.j+OH'MHOOiOcj+O  =  Mf(j+1,m) 

k=j+1  (4C33) 

A  comparison  of  (4.20)  and  (4.33)  uields  the  following  theorem. 

Theorem  4.57;  The  system  (4.1)  and  (4.2)  is  completely  observable  on 
the  set  of  times  (j-gj+l  ,  ...,m}  if  and  only  if  the  adjoint  system  (4.5) 
and  (4.6)  is  completely  controllable  on  the  set  of  times j j-1 9j,..„ ,m]. 


It  is  interesting  to  note  that  the  lemmas  given  elsewhere  [70 J 
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correspond  to  the  results  we  have  obtained  for  the  backward  system.  For 

— 1 
discrete- time  systems  in  which  F  (k)  exists  and  for  continuous-time  sys- 
tems the  distinction  between  forward  and  backward  equations  does  not  arise. 
In  these  cases  the  basic  lemmas  may  be  stated  in  alternate  forms  corres- 
ponding to  those  we  have  obtained  for  the  forward  and  backward  systems. 
We  may  easily  relate  theorem  4.57  to  the  general  solution  of  the 
linear  smoothing  problem  given  in  Appendix  D.  Consider  the  case  in  which 
the  random  input  w  and  the  measurement  noise  v  are  uncorr elated.  For 
convenience  we  rewrite  (D.4?)  in  the  following  form; 

X(k-1 )  =  F« (k)^(k)  +  H} (k)  [  H(k)P(k)H« (k)  +  R(k) ] 

(z(k)  -  H(k)  [  x(klk-1)  +  P(k)F' (k)^(k)J  ]  (4.34) 

In  addition  from  (D.48)  we  obtain 

x(oln)  =m  +  P(o)^(-1)  (4.35) 

From  (4.35)  we  see  that  the  estimate  of  the  initial  state  may  take  on 
all  possible  values  only  if  ^  (-1 )  can  take  on  all  possible  values „  but 
from  (4.34)  we  see  that  \^~1 )  can  take  on  all  possible  values,  only  if 
the  adjoint  system  is  completely  controllable  on  the  set  of  times 
[-1,o,,,,,n],  The  solution  of  the  linear  estimation  problem  given  in 
Appendix  D  is  optimal  even  if  the  system  is  not  completely  observable 
and  the  optimal  procedure  is  simply  to  estimate  the  a  priori  mean  for 
those  states  which  are  unobservable. 
If  we  note  that  the  expression 

z(k)  -  H(k)  [x(k|k-l)  +  P(k)F»  (k)\(k)  ] 

appearing  in  (4.34)  is  the  estimate  of  the  measurement  noise  v(k)  we 

-109- 


see  that  an  absurd  situation  would  arise  if  the  system  were  not  com- 
pletely observable  and  the  adjoint  system  were  completely  controllable. 
For  then  we  would  be  using  the  estimate  of  the  measurement  noise  to  modify 
the  £  priori  estimate  of  certain  state  variables  when  the  measurements 
themselves  contained  no  information  about  those  state  variables. 

We  may  also  relate  theorem  4.55  to  a  simple  control  problem.  Con- 
sider the  problem  of  moving  the  system  (4.1)  from  a  specified  initial 
state  x(j)  to  a  desired  state  x,  in  n  transitions  using  the  control 
sequence  for  which  the  "energy"  expended  is  minimum.  "Energy"  is  defined 
to  be 

m-1         2 
k=j 

To  solve  this  problem  we  minimize  the  following  expression; 

m-1  2 

E       *  II  EGO  II       +  V  00    [  *<k+1 )  -  I(k)x(k)  -  G  (k)u(k)] 

k=o  '  (4.36) 

subject  to  the  constraint 

m°1 
x,(m)  -  I(m,j)x(j)  -  J]  f (m,k+1  )G(k)u(k) 

k?=j  (4.37) 

Setting  the  partial  derivatives  of  (4.36)  with  respect  to  u(k),  X(k)  and 
x(k)  equal  to  zero  yields  the  following  equations; 

x(k+1  )  =  F(k)x(k)  +  G(k)> (k)  (4.1 ) 

>  (k-1 )  =  F« (k)\(k)  (4.38) 

u(k)  =  G«(k)A(k)  (4.39) 
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Note  that  u(k)  is  the  output  of  the  adjoint  system.  From  (4.38)  we  may- 
obtain  the  following  expression; 

X(k)  =  I(k,m-1 )A(m-1 )  (4.40) 

Substituting  (4.38)  and  (4.40 )  into  (4.37)  yields 

m-1 
^(m)  -  £(m,j)x(j)  =  Yj     I(m,k+1)G(k)G,(k)i(k,nt.1)A(B^1)       (4.41) 

Using  the  identity  (4.7)  we  inay  rewrite  (4.41 )  in  the  following 
two  equivalent  forms; 

3^(m)  -  I(m,j)x(j)  =  ^(j.mjACm-l)  ,m  =  j+n        (4.42) 

x^m)  -  I(m,j)x(j)  =  M^j^m-O^m-I)  ,m  =  j+n        (4.43) 

Lemma  4.52  says  that  the  state  x,  may  be  reached  from  x(j)  in  n  transitions 
if  and  only  if  some  ^(m-1 )  satisfies  (4.42).  Lemma  4.53  says  that  every 
state  may  be  reached  from  x(j)  in  n  transitions  if  and  only  if  there  is 
a  unique  X_(m-1 )  satisfying  4.42.  If  we  consider  u(k)  to  be  the  output  of 
the  adjoint  system  then,  by  lemma  4.54,  X_(m-1  )  can  be  uniquely  determined 
on  the  basis  of  the  following  sum; 

m~1 
^(m)  -  I(m,j)x(j)  =  V     I5(m~1,k)G(k)u(k) 

k=J 

if  and  only  if  the  adjoint  system  is  completely  observable  on  the  set  of 
times  {  j,..„  9m-lj .  Theorem  4.56  simply  says  that  the  system  is  completely 
controllable  if  and  only  if  the  Lagrange  multiplier  ^(m-1 )  may  be  determined 
uniquely;  that  is,  if  and  only  if  the  adjoint  system  is  completely  observable 
on  the  set  of  times  [  J,.., ,m«1 j. 
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4,6   Monlinear  Systems 

At  present  almost  nothing  is  known  about  controllability  and 
observability  for  nonlinear  systems,,  A  theorem  on  local  controllability 
of  nonlinear  systems  was  proven  by  Markus  and  Lee  [71 J  by  assuming  that 
the  linear  variational  equations  were  completely  controllable.  The  dis- 
cussion of  their  paper  by  Kalman  is  also  of  interest. 

In  spite  of  the  present  lack  of  understanding  of  these  matters, 
there  still  seems  to  be  a  strong  connection  between  observability  of  a 
nonlinear  system  and  controllability  of  the  adjoint  system.  This  is  in- 
dicated by  the  fact  that  the  ubiquitous  two-point  boundary  value  problems 
involve  the  adjoint  equations.  For  example,  consider  a  system  in  which 
the  state  vector  may  be  divided  into  a  completely  unobservable  part  x? 
and  a  remaining  part  x,  ,  in  such  a  way  that  the  equations  for  the  system 
may  be  written  in  the  following  form; 

XjOg+1)  =  f^OO.uOO.k) 
x£(k+1)  =  f2(x1(k)Bx2(k)„u(k),k) 

y.(k)  =  hCx^k^k) 
If  we  examine  the  adjoint  system 

X(k-1)  =  OP/sxjAdc)  +  (3h«/3x)  v(k) 
or,  more  specifically 


\,(k-o 


^(M) 


3  f  1  /  3  x 


*  4/^ " 

^  GO  " 

+ 

3  f  /ax 
~2       ~2  J 

-  ^2         J 

ah'/a^ 


l(k) 
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we  find  that  there  is  no  trajectory  along  which  the  linearized  adjoint 
system  is  completely  controllable 0 

Similarly,  if  we  consider  a  system  with  obviously  uncontrollable 
state  variables  of  the  following  form; 

Xj  (k+1 )  =  £,  (Xj (k)9x2(k),u(k),k) 
x2(k+1)  =  f^x^khk) 

we  find  that  there  is  no  trajectory  along  which  the  following  adjoint 
system  is  completely  observable; 


A,  (k-1  ) 

3*J/ax 

0 

A,  (k) 

A2(k-0 

ail'/aag 

^(k) 

Z    =    [  3£,/9u 

0] 

"  L,  M 

*   (k) 

.    -2        J 

If  we  were  to  consider  the  problem  of  observing  a  small  perturbation 
of  the  initial  state  x(j)  of  the  system  (4e11)  and  (4.13)  we  would  investi- 
gate the  variational  equations 

Sx(k+1)  =  (9f/sx)  <fx(k) 
^Z(k)  =  (ah/ ax)  <fx(k) 

Complete  observability  of  the  variational  equations  along  the  nominal  tra- 
jectory would  insure  local  observability  of  a  small  perturbation  in  x(j)0 
Complete  observability  of  the  variational  equations  is,  of  course,  the  same 
as  complete  controllability  of  the  adjoint  equations c 
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CHAPTER  V 

EXPERIMENTAL  RESULTS 

'p  o  1   Introduction 

In  this  chapter  we  shall  present  the  results  of  the  application 
of  several  of  the  techniques  of  Chapter  II  to  some  particular  examples. 
The  aim  of  these  numerical  studies  is  to  give  an  indication  of  the  type 
of  performance  that  can  be  expected  when  approximation  methods  are  used 
in  nonlinear  estimation  problems.  With  regard  to  the  approximation 
technique  for  the  filtering  problem  given  in  section  2081  ;  the  behavior 
of  the  P  matrix,  the  stability  of  the  filter,  the  effect  of  known  con- 
trol inputs,  and  the  overall  quality  of  the  estimates,  are  all  of  prime 
importance 0  With  regard  to  the  smoothing  problem;  the  quality  of  the 
first  approximation  sequence  {x  (k|n)j  and  the  convergence  properties 
of  the  successive  linearization  technique,  are  the  basic  factors  to  be 
considered 0  While  one  cannot  draw  positive  general  conclusions  from 
specific  examples,  we  feel  that  the  results  of  these  simulation  studies 
indicate  that  the  methods  under  consideration  hold  great  promise 0 

5q2   Pictures  and  Random  Variables 

The  simulation  studies  discussed  in  this  chapter  were  performed 

on  an  IBM  7090  computer,,  The  results  will  be  displayed  in  a  series  of 

i 

pictures;  each  of  which  represents  a  time  history  of  1000  points,  time 
running  from  left  to  righto  The  scale  for  each  picture  will  be  indicated 
by  giving  the  maximum  in  absolute  value  of  the  variable  or  by  referring 
to  an  adjacent  picture  of  the  same  scale c  Positive  values  of  the  variables 
lie  below  the  horizontal  axise 
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Gaussian  random  variables  were  approximated  by  a  sum  of  twelve 
independent  pseudo-random  variables,  each  having  a  uniform  distribution 
on  the  interval  [  0 , 1  ]  . 

5.3   Example;  As  a  first  example  a  first  order  system  with  a  randomly 
varying  parameter  was  investigated.  Such  a  system  could  be  a  model  for 
a  noisy  RC  network  in  which  the  resistance  varied  in  a  random  manner.  The 
continuous-time  version  of  the  system  is  given  by  the  following  equations; 

Xi(t)  =[a  +x2(t)]  Xl(t)  +u(t)  +  Wl(t)  (5.1) 

x2(t)  =  b  XgCt)  +  w2(t)  (5.2) 

z(t)   =  x,(t)  +  v(t)  (5.3) 

where  u(t)  is  a  known  input;  w..(t),  w2(t)  and  v(t)  are  white  Gaussian 
noise  processes;  z(t)  is  the  observed  output;  and  the  random  parameter 
x2(t)  is  the  output  of  a  linear  system  excited  by  white  Gaussain  noise. 
Because  the  simulation  was  to  be  conducted  on  a  digital  computer ,  this 
system  was  approximated  by  the  following  discrete-time  system; 

x1  (t4h)  =  x1  (t)  +  [a  +  xg(t)]  x1  (t)h  +  u(t)h  +  w1  (t)h       (5.4) 
x2(t+h)  =  x2(t)  +   b  xg(t)h  +  w2(t)h  {^>05) 

z(t)    =  Xl(t)  +  v(t)  (5.6) 

For  a  =  -  1.0,  the  value  of  .01  was  used  for  h.  This  corresponds 
to  a  sampling  period  of  10  milliseconds  for  a  system  with  an  average  time 
constant  of  one  second. 

The  recursive  scheme  of  (2.73),  (2.7*0  a"d  (2.77)  was  used  to 
obtain  estimates  of  x^  and  Xg  in  real  time.  The  successive  linearization 
technique  of  section  2.84  was  used  to  obtain  a  solution  of  the  smoothing 
problem.  The  equations  describing  the  estimation  procedure  for  this 
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problem  can  easily  be  obtained  by  straightforward  substitution  into  the 
basic  equations  of  Chapter  II. 

i 

5.31   Case  I 

The  following  data  pertains  to  the  simulation  results  shown  in 
Plate  I. 

system  parameters:      a  =  -  1 .0 

b  =  -  0.1 
h  =   0.01 

initial  conditions :     x1 (o)  =   1.0 

x2(o)  =  -  0.5 

a  priori  distribution;   m..  =  0.0  P-j-|(o)  =  0»5 

m2  =  0.0  P-|2(°)  =  Oo° 

P22(o)  =  °»6 

noise  levels?  E  [w1  (j)w1  (k)]^  36.0  £» 

E  [w2(j)w2(k)]*   1.0  S$k 

E  [v(j)v(k)]  *   9.0  6jk 

w1 ,  w2  and  v  are  statistically  independent, 
Discussion;  This  data  set  represents  a  rather  severe  noise  condition  as 
indicated  by  parts  B  and  C  of  Plate  I.  The  real  time  estimate  of  x. 
closely  followed  the  actual  value.  The  error  in  the  a  priori  estimate 
of  Xg  was  overcome  soon  after  the  known  input  u  excited  a  transient,  as 
can  be  seen  in  parts  G  and  H. 

The  results  of  the  application  of  the  successive  linearization 
technique  to  the  smoothing  problem  are  shown  in  parts  F  and  I.  The  iter- 
ation procedure  was  terminated  when  the  maximum  deviations  between  the 
ith  and  the  i+1  approximations  for  both  x1  and  x2  were  less  than  one  per- 
cent of  the  maximum  value  of  the  quantity  being  estimated.  This  criteria 
was  satisfied  by  the  third  approximation.  The  difference  between  the 


.116- 


first  three  approximations  could  not  be  detected  in  the  pictures. 
Note  that  in  the  smoothed  solution  for  x2  shown  in  Part  I,  the  error 
in  the  a  priori  estimate  was  overcome. 

The  known  input  u  -was  chosen  as  a  representative  example  of  the 
type  of  saturating  signals  occurring  in  control  systems.  The  total  com- 
puter time  used  for  the  entire  simulation  was  0.73  minutes. 

5.32   Case  II 

The  following  data  pertains  to  the  simulation  results  shown  in 
Plate  II. 

system  parameters;     a  =  -1.0 

b  =   0.0 
h  =   0.01 

initial  conditions:     x. (o)  =   2.0 

x£(o)  =  -  1.5 

a  priori  distribution;  m.  =  0.0  P-j-tCo)  =  1.0 

mg  =  0.0  P12^°^  =  Oo° 

P22(o)  =  0»6 

noise  levels;         E  [w.  (j)w .(k)~]=     16.0  & 

E  [w2(j)w£(k)]=   olo 
E  [v(j)v(k)]  =   9.0  S.k 

w. ,  wx  and  v  are  statistically  independent. 

Discussion;  For  this  data  set  the  parameter  x„  is  an  unknown  constant. 
The  severity  of  the  noise  conditions  is  indicated  by  parts  B  and  F  of 
Plate  II.  In  part  D  it  can  be  seen  that  soon  after  the  known  input  u 
excited  a  transient,  the  estimator  was  able  to  estimate  Xg  very  closely. 
The  estimate  of  x.  is  shown  in  part  H.  The  improvement  in  the  estimate 
of  x.  with  increasing  time  can  be  attributed  to  the  information  in  the 
transient  and  the  improved  estimate  of  x^,. 
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5 A      Example;  As  a  second  example,  a  second  order  system  with  a  randomly 
varying  natural  frequency  was  investigated.  The  continuous-time  version 
of  the  system  is  given  in  the  example  of  section  1.^3.  Because  the  sim- 
ulation was  to  be  conducted  on  a  digital  computer,  this  system  was  approx- 
imated by  the  following  discrete- time  system; 

Xl(t+h)  =  3C,(t)  +x2(t)h  (5.7) 

x^t+h)  =  x2(t)  -  b  x2(t)h  -  [  cQ  +  x3(t)  ]  Xl  (t)h  +  u(t)h  +  Wl  (t)h 

(5.8) 
x3(t+h)  =  x3(t)  -  a  x^tjh  +  w£(t)h  (5.9) 

z(t)  =Xl(t)  +  v(t)  (5.10) 

The  recursive  scheme  of  (2.73),  (2.7^)  and  (2.77)  was  used  to 
obtain  estimates  of  x.. ,  x^  and  x~  in  real  time.  The  equations  describing 
the  estimation  procedure  for  this  problem  are  easily  obtained  by  straight- 
forward substitution  into  the  basic  equations  of  Chapter  II. 

:*  '•■■'  .* 

5>1   Case.  Ill 

The  following  data  pertains  to  the  simulation  results  shown  in 
Plate  III, 


system  parameters;      a  =  0.1 

b  =  0.2 

c0=  3.0 

h°  =  0.01 

initial  conditions;     x. (o)  =   0.5 


x2(o)  =   0.5 
x3(o)  =  -  0.5 


a  priori  distribution; 


m1  =  0.0         P11(o)  =  1.0         V22^  =  °»5 
Dio  =  0.0  P-,o(o)  =  0.0  P?-5(o)  =  0.0 

m3  =  0.0  p13(o)  =  0.0  p^(o)  =  0.5 
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noise  levels;      E  [w-j  (j)w1  (k)J  =    0.16  6., 

E  [w2(j)w2(k)]  =  225.0   6jk 
E  Lv(j)v(k)]   =    0.64  S.k 

w.. ,  w2  and  v  are  statistically  independent 0 

Discussion;  In  this  data  set  large  and  rapid  changes  in  the  natural  fre- 
quency [cQ  +  Xo]  occur o  The  noise  conditions  are  moderate  as  shown  in 
parts  B  and  C  of  ELate  III.  The  real  time  estimates  of  the  state  variables 
are  seen  to  follow  satisfactorily  the  rapid  changes  in  the  state  variables. 
Note  that 9  since  Xg  is  the  discrete- time  analogy  of  the  derivative  of  Xj , 
the  job  of  estimating  this  quantity  is  intrinsically  difficult.  Again 
there  is  improved  behavior  after  the  known  input  u  excites  a  transient. 
The  behavior  of  the  P  matrix  shown  in  parts  J  through  0,  is  typical  of 
the  behavior  observed  in  a  large  number  of  runs.  The  total  computer  time 
used  for  the  entire  simulation  was  0.?4  minutes. 

5.42   Case  IV 

The  following  data  pertains  to  the  simulation  results  shown  in 
Plate  IV. 


system  parameters;      a  =  0.0 

b  =  1.0 

c  =  0.0 

h  =  0.01 


initial  conditions;     x..(o)  =  0.0 

x2(o)  =  0.0 
X3(o)  =  4.0 

a  priori  distribution; 

m1  =  0.0  P^Co)  =  0.0  P22^0^  =  °'° 
m2  =  0.0  P-|2(°)  =  °«°  P23(°)  =  °»° 
m^  =  1.0         P-^Co)  =  0.0         P33(o)  =  4.0 
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noise  levels;  E  [w1  (j)w1  (k)]  =     1.0      5  -k 

E  [w2(j)w2(k)]  =     0.0 
E  [v(j)v(k)]       =     0.01    cTJk 

w.j ,  w2  and  v  are  statistically  independent. 

Discussion;  For  this  data  set  there  is  no  known  input.  The  natural  fre- 
quency x~  is  an  unknown  constant.  The  noise  condition  is  very  severe  as 
indicated  by  part  A  of  Plate  IV.  The  estimate  of  x~  takes  about  750 
sampling  periods  to  overcome  the  error  in  the  a  priori  estimate  and  settle 
on  a  good  estimate  of  x,,  Once  this  has  occurred,  the  estimates  of  the 
other  state  variables  also  improve.  The  total  computer  time  used  for 
the  entire  simulation  was  0.64  minutes. 

5M       Case  V 

The  following  data  pertains  to  the  simulation  results  shown  in 
Plate  V. 


system  parameters;      a  =  0.0 

b  =  0.5 

c0  =  0.0 

h  =  0.01 


initial  conditions;     x.  (o)  =  0.0 

x^o)  =  0.0 


^(o)  =  3.0 


a  priori  distribution; 


m.,  =  0.0  P11(o)=  1.0  Vz2^  ~  Oo-5 
m2  =  0.0  Pl2^  =  Oe0  p23^0^  =  °9° 
ra^  =  0.0        Pi3(°)  =  °«>0        P^Co)  =  4.0 

noise  levels;  E  [^  ( j )w1  (k)]  =  0.25  &ik 

E  [w2(j)w2(k)]=  0.0 

E  [v(j)v(k)]  =  1.0  6^ 

w^  8  w2  and  v  are  statistically  independent, 
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Discussion;  For  this  data  set  the  natural  frequency  of  the  system  is 
again  an  unknown  constant;  but  this  time  there  isa  known  input.  The  noise 
levels  are  moderate  as  indicated  by  parts  B  and  C  of  ELate  V„  Again  notice 
that  the  estimate  of  x~  overcomes  the  error  in  the  a  priori  estimate  soon 
after  the  known  input  excites  a  trans iento  The  overall  performance  of 

the  estimator  is  quite  satisfactory.  The  behavior  of  the  P  matrix  is 

* 
typical.  Note  that  the  "variance"  p^_  decreases  rapidly  as  the  estimate 

* 
Xo(klk)  improves.  The  total  computer  time  used  for  the  entire  simulation 

was  0.68  minutes. 
5.44   Case  VI 

t 

The  following  data  pertains  to  the  simulation  results  shown  in 
Plate  VI. 


system  parameters;      b  =  0.5 

c  =  2.0 
h  ='  0.01 


(system)  a  =  0.1 
(estimator)  a  =  0.2 

initial  conditions;     x. (o)  =  0„5 

x2(o)  =  0e5 
x~(o)  =  7.0 


a  priori  distribution; 


m,  =  0.0  P11(o)  =  0.5  P22^0^  =  1o° 
mg  =  0.0  P12^0^  =  0e0  P23(°)  =  °o° 
m^  =  0.0         P-j3(o)  =  0.0        Vjj(o)   =  4.0 

noise  levels;  E  [w1(j)w1(k)]  =    0.36  S.^ 

(system)  E  |w2(j)w2(k)]  =    0.0 

(estimator)  E  Lw2(j)w2(k)l  =  225.0   ^ 

E  fv(j)v(k)J  =    1.0  £.k 

w.j ,  w2  and  v  are  statistically  independent, 
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Discussion;  For  this  data  set  the  estimator  had  imperfect  information 
about  the  statistical  behavior  of  x~.  The  actual  value  was  the  geometric 
curve  obtained  as  the  output  of  the  following  system; 

x3(t+h)  =  x^(t)  -  0.1  3u(t)h  x^(o)  =  7.0 

The  estimator  was  designed  under  the  assumption  that  Xo(t)  was  the  out- 
put of  a  linear  system  of  a  different  time  constant  excited  by  white  noise, 

The  noise  conditions  were  moderate  to  severe  as  is  indicated  by 
parts  B  and  C  of  ELate  VI „  Despite  the  large  error  in  the  a  priori  es- 
timate and  the  misinformation  about  the  statistical  nature  of  Xo,  the  es- 
timator quickly  overcomes  the  initial  error  and  gives  satisfactory  per- 
formance, even  before  the  known  input  u  excites  a  transient  in  the  system. 
The  total  computer  time  used  for  this  simulation  was  0«76  minutes. 

5,5       Conclusions 

In  all  cases  studied  the  elements  of  the  P  matrix  were  well 
behaved.  In  fact,  they  were  so  well  behaved  that  it  seems  likely  that 
in  an  application  of  these  techniques,  a  simplified  version  of  the  filter 

could  be  obtained  by  presetting  the  values  of  the  diagonal  elements  of 

* 
the  P  matrix  to  agree  with  the  average  behavior  of  these  elements  in 

previously  conducted  simulation  studies 6  This  would  be  very  desirable 

if  an  analogue  computer  were  to  be  used,  since  it  would  reduce  the 

number  of  multipliers  needed. 

No  instability  of  the  estimator  was  observed. 

The  effect  of  known  inputs  is  to  excite  transients  which  contain 
information  about  the  system  parameters  and  simplify  the  estimation 
problem.  The  exact  nature  of  the  known  input  does  not  seem  to  matter  as 
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long  as  transients  are  excited. 

The  performance  of  the  estimator  in  Case  VI  indicates  than  an 
estimator  designed  on  the  assumption  that  unknown  parameters  are  the 
outputs  of  linear  systems  excited  by  white  noise,  can  give  good  estimates 
even  when  this  assumption  is  apparently  unjustified.  This  is  very  im- 
portant since  information  about  the  statistical  behavior  of  a  parameter 
may  be  limited.  This  is  also  an  indication  of  the  power  of  the  model 
which  treats  random  parameters  as  the  outputs  of  linear  systems  excited 
by  white  noise.  In  practice  the  time  constants  of  these  fictitious  linear 
systems  would  be  chosen  to  reflect  the  anticipated  rapidity  of  the  en- 
vironmental changes e 

The  quality  of  the  estimates  depends  on  the  difficulty  of  the 
problem;  that  is,  on  the  presence  of  known  inputs  and  on  the  noise  levels. 
The  examples  presented,  for  the  most  part,  represent  cases  of  rather 
severe  noise  conditions  in  which  satisfactory  estimates  were  repeatedly 
given.  A  noisy  noise  annoys  an  oyster  but  apparently,  the  nonlinear 
estimator  is  a  pearl  of  a  different  sort. 

The  method  of  successive  linearization  converged  rapidly  to  the 
solution  of  the  nonlinear  two=point  boundary  value  problem  in  the  cases 
investigated.  The  first  approximation jx  (k|n)J  was  excellent  in  all  cases 
and  could  be  used  with  confidence  as  the  starting  point  in  any  method 
of  successive  approximations. 
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—  liL-4-- 


A:  Known  input  u(k) 
Max:  10.0 


B:  Total  input  u(k)  +  w(k) 
Scale  of  A 


Observed  output  z(k) 
Max:  16.5 


State  variable  x-j  (k) 

Max:  9.5 


Real  Time  Estimate  x^(klk) 
Scale  of  D 


L1 


Successive  Linearization  x^Ckln) 
i=  1,  2,  3   Scale  of  D 


PLATE  I 
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State  variable  x?(k) 
Max:  0.51 


*%* 


H:  Real  time  estimate  X2(k|k) 
Scale  of  G 


Successive  linearization  x2  (k|n) 
i  =  1 ,  2,  3   Scale  of  G    i 


PLATE  I  (con't. ) 
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A:  Known  input  u(k) 
Max:  10.0 


B;  Total  input  u(k)  +  w(k) 
Scale  of  A 


C:  State  variable  XgCk) 
^(k)  =  -  1.5 


D;  Real  time  estimate  XgCklk) 
Scale  of  C 


E:  Successive  Linearization  Xg. (kin) 
1=1,2,3  Scale  of  C    i 


PLATE  II 
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Observed  output  z(k) 
Max:  13.4 


State  variable  x1 (k) 

Max:  4.4 


H:  Real  time  estimate  x-j(klk) 
Scale  of  G 


Successive  linearization 


M 


(kin) 


J:     Successive  linearization 
c1. 


(km) 


i=1 


Scale  of  G 


i  =  2,  3 


Scale  of  G 


PLATE  II   (con't. ) 
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Known  input  u(k) 
Max:  5.0 


Total  input  u(k)  +  w(k) 
Scale  of  A 


C:  Observed  output  z(k)  =  x1 (k)  +  v(k) 
Max:  4.97 


D:  State  variable  x.  (k) 


Scale  of  C 


E:  Real  time  estimate  x.,(klk) 


Scale  of  C 


PLATE  III 
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F:  State  variable  Xp(k) 
Max:  3.5 


G:  Real  time  estimate  x^(k|k) 
Scale  of  F 


?*«* 


**W 


a/V 


H:  c(k)  =  cQ  +  x3(k) 


J:  P^  (k)  Scale  of  0 


I :  Real  time  estimate 

c*(klk)  =  c  +  x^klk) 
Scale  of  H 


K:  P12(k)  Scale  of  0 


L:  P*o(k)  Scale  of  0 


H:  Po9(k)  Scale  of  0 


22 


N:  P23(k)  Scale  of  0 


0:  P33(k)   Max:  3.24 


PLATE  III  (con't. ) 
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Observed  output  z(k)  =  x1 (k)  +  v(k) 

Max:   0.4 


B:  State  variable  x-|  (k) 
Scale  of  A 


C:  Real  time  estimate  x. (k| k) 


Scale  of  A 


i 

>• 

i 

f 

K    . 

"*                           •■  V 

(. 

£i        i 

i 

?J           7 
?     6         * 

V'  /<  JV  'J 

i 

•:»v  '       .    %. 

5     V 

•  -  •  . 

•1 

< 

'•'  if. 

State  variable  X2(k) 
Max:  0.2 


Real  time  estimate  x  (k'k) 
Scale  of  D        2 


State  variable  x~(k) 
Value  +  4.0     J 


(J:  Real  time  estimate  x„(k|k) 
Scale  of  F        J 


PLATE  IV 
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A:  Known  input  u(k) 
Max:  5.0 


B:  Total  input  u(k)  +  w(k) 
Scale  of  A 


C:  Observed  output  z(k)  =  x1 (k)  +  v(k) 
Max:  5.28 


D:  State  variable  x.  (k) 
Scale  of  C 


E:  Real  time  estimate  x..(k|k) 
Scale  of  C 


PLATE  V 
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State  variable  ^(k) 
Max:  2.44 


Real  time  estimate  x^Cklk) 
Scale  of  F 


H:  State  variable  x~(k) 
Value  3.0     ■* 


I: 


Real  time  estimate  x^(klk) 
Scale  of  H        * 


Scale  of  0 


K:  P12(k)   Scale  of  0 


P*3(k) 


Scale  of  0 


M: 


P22(k)   Scale  of  0 


cale  of  0 


PLATE  V  (con't. ) 
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A:  Known  input  u(k) 
Max:  5.0 


B:  Total  input  u(k)  +  w(k) 
Scale  of  A 


C:  Observed  output  z(k)  =  x..  (k)  +  v(k) 


D:  State  variable  x1 (k) 
Scale  of  C 


E:  Real  time  estimate  x..(klk) 
Scale  of  C 


ELATE  VI 
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F:  State  variable  ^(k) 
Max:  1.4 


G:  Real  time  estimate  :x>>(klk) 
Scale  of  F 


Vto***^ 


H:  State  variable  Xo(k) 
Max:  7.0 


Real  time  estimate  x^(k|k) 
Scale  of  H        * 


J:  c(k)  =  cQ  +  x3(k) 


Max:  9.0   ,  cQ  =  2.0 


Real  time  estimate 
c*(klk)  =  on   +  x*(klk) 
Scale  of  J 
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PLATE  VI  (con't. ) 


CHAPTER  VI 


CONCLUSION 


Our  research  has  been  devoted  to  a  fundamental  problem  of  con- 
siderable importance  in  modern  control  theory.  The  linear  problem  has 
been  solved  (Appendix  D)  and  a  scratch  has  been  made  in  the  armor  of 
nonlinearity. 

The  experimental  results  of  Chapter  V  are  extremely  encouraging 
and  indicate  that  the  approximation  techniques  may  have  wide  applicability. 
The  results  show  that  better  performance  can  be  expected  when  some  inputs 
to  the  system  are  known ,  Such  known  inputs  usually  excite  transients , 
the  effect  of  which  can  be  measured  at  the  output.  These  transients  con- 
tain information  about  the  state  of  the  system  and  the  values  of  various 
parameters.  One  can  further  speculate  that  if  information  about  a  parti- 
cular parameter  is  not  available  in  the  observations,  then  the  performance 
of  the  system  is  relatively  insensitive  to  variations  of  that  parameter 
and  the  need  for  accurate  estimation  is  diminished. 

The  approximate  solution  to  the  nonlinear  filtering  problem  is 
immediately  applicable  to  adaptive  control  problems.  Note  that  if  measure= 
ments  can  be  made  of  the  input  and  the  output  of  a  subsystem  containing  a 
particularly  troublesome  parameter,,  then  the  order  of  the  problem  can  be 
reduced  significantly  by  treating  only  that  subsystem. 

The  assumption  that  various  noises  are  Gaussian  is  equivalent  to 
making  a  least-squares  estimate  when  only  means  and  covariances  are  known. 

Like  most  research  programs,  this  report  raises  more  questions  than 
it  answers.  It  is  hoped  that  the  following  list  will  stimulate  thought, 
discussion  and  future  research. 
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A  theoretical  question  of  great  practical  importance  is  that  of 
stability,,  The  stability  of  the  linear  filter  has  been  shown  by  Kalman 
[29]  using  the  direct  method  of  Liapunov  f^cTh  It  seems  likely  that  the 
stability  of  the  nonlinear  filter  obtained  by  linearization  about  the 
present  estimates  can  be  proved  under  certain  restrictions „  What  are 
these  restrictions? 

It  would  be  interesting  to  know  the  most  general  types  of  systems 
for  which  the  probability  density  function  (2„7)  is  unimodal.  Convexity 
of  (2 08)  is  a  sufficient  condition 0 

A  fast  simple  numerical  procedure  for  solving  the  smoothing  problem 
guaranteeing  convergence  would  be  of  grpat  value „ 

The  areas  of  observability  and  controllability  are  suitable  for 
theoretical  research 0 

If  a  stationary  random  process  is  viewed  as  the  output  of  a  linear 
system  excited  by  white  noise ,  then  the  approximation  techniques  might 
be  used  to  identify  the  parameters  of  this  fictitious  system*  It  would 
be  interesting  to  compare  the  results  of  such  an  approach  with  the  more 
conventional  approach  of  measuring  the  correlation  function  or  the  spectrum 
and  then  approximating  the  spectrum  by  a  rational  polynomial. 

It  seems  likely  that  a  combination  of  the  method  of  successive 
linearization  and  the  filtering  technique  could  be  used  to  obtain  better 
estimates  in  real  time0  That  is9  instead  of  linearizing  about  the  present 
estimate,,  one  could  smooth  over  the  past  N  observations „  This  seems  to 
be  a  real  possibility  in  applications  where  sampling  periods  and  computa- 
tion  times  are  of  the  same  order  of  magnitude,, 

The  development  of  approximation  techniques  for  use  in  conjunction 
with  the  dynamic  programming  formulation  of  the  discrete- time  problem 
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would  be  an  important  contribution.  Such  techniques  would  be  immed- 
iately applicable  to  a  wide  range  of  dynamic  programming  problems. 

It  would  be  interesting  to  apply  the  successive  linearization 
technique  to  the  nonlinear  regulator  problem  discussed  in  section  3o8. 

It  would  be  desirable  to  simplify  the  nonlinear  filter  as  much 
as  possible  without  impairing  performance.  Presetting  the  values  of  the 
diagonal  elements  of  the  P  matrix  is  one  possibility,,  Another  possibility 
is  to  use  the  basic  configuration  of  the  filter  but  to  set  certain  para- 
meters by  the  method  of  stochastic  approximation  [38,  56 J . 

Finally,,  there  is  need  for  further  simulation  studies  applying 
the  methods  of  this  report  to  more  complicated  systems. 
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APPENDIX  A 


MATRIX  IDENTITIES 


A1   Quadratic  Forms 

We  begin  by  summarizing  a  few  facts  about  matrices  so  that 
they  will  be  available  for  reference c 

The  length  (or  Euclidian  norm)  of  a  vector  x  is  simply 

||x|   =   l/x/  +  ooo  +Xn2 

Clearly,  ||  x  ||  >  o  for  x  ^  o0 

Let  Q(x)  be  the  quadratic  form  associated  with  a  symmetric 
matrix  Ao  That  is, 

Q(x)  =  x'Ax  =  2_,  a.  xx.  ,  a.  .  =  a.. 

A  is  said  to  be  positive  definite  if  Q(x)  >  o,  for  x  £  o.  Clearly, 
Q(o)  =  Oo  A  is  non=negative  definite  if  Q(x)  -  o,  for  x  ^  o„  Let 
A  and  B  be  non-negative  definite  and  let  C  =  A  +  B„  Then 

£' Cx  =  x» (A  +  B)x  =  x1 Ax  +  x" Bx 

Thus,  C  is  at  least  non-negative  definite 0  G  would  always  be  positive 
definite  if  either  A  or  B  were  positive  definite  and  the  other  were  non- 
negative  definite „ 

Let  D  be  any  matrix.  Then  D'D  is  non-negative  definite,  since 
£'£'££  =  II  25  ll  -  °«  If  D  is  non-singular,  then  D'D  is  positive  definite 
since  in  this  case  Dx  =  o  implies  x  =  o. 
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Conversely,  any  positive  definite  matrix  A  may  be  factored  into 
a  product  of  the  form  D'D  where  D  is  non-singular,,  To  see  this  we  re- 
call that  the  characteristic  numbers  of  a  positive  definite  matrix  are 
positive  and  that  any  symmetric  matrix  may  be  reduced  by  an  orhtogonal 
transformation  to  a  diagonal  matrix  having  its  characteristic  numbers 
on  the  diagonal.  Let  A  be  positive  definite  and  let  T  be  the  appropri- 
ate orthogonal  transformation  (T1  =  T°  ).  Then 

T'AT  -    A    =  diag(  \1  tae0B  \n)  ,  \±   >  o 

A  suitable  D'  is  T8  diag(  ^  ,..«,  7An)-  If  A  were  non-negative 
definite  D  would  be  singular,  since  some  of  the  terms  on  the  diagonal 
of  A  would  be  zero8  If  A  were  positive  definite,  then  A"  could  be 
written  in  the  following  form? 

A~1  =  T  diag(l/  A1 1/X  n)T" 

which  shows  that  A   is  positive  definite „ 

Finally,  if  A  is  non-negative ,  then  H'AH  is  at  least  non- 
negative  definite,  since  x'H'AHx  =|DHx||-  o. 

A2   A  Visual  Approach 

In  the  opinion  of  the  author,  the  use  of  block  diagrams  or  flow 
graphs  enables  one  to  produce  matrix  identities  far  faster  and  more  simply 
than  by  straight  algebraic  techniques „  We  will  view  a  matrix  as  a  linear 
operator o  For  example,  the  equation  £  =  Ax  we  interpret  as  a  system 
A  operating  on  an  input  x  to  produce  a  unique  output  £„  Similarly,  the 
equation  y_  =  ABx  is  given  the  interpretation  that  first  B  operates  on 
x  and  then  A  operates  on  this  result  to  produce  £„  See  Fig,  A-1 c 
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$      A 
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y  =  Ax 


Fig.  A-1 

Some  basic  axioms  of  matrix  theory  and  the  associated  block  diagram 
manipulations  are  given  in  Fig.  A=2. 
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if  A"'   exists 
A  +  B  =  [I  +  BA~1]A 


if  A"     exists 
*       £  +  B  =  AJJ  +  A~1B] 


Fig.  A-2 
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The  question  of  placing  a  matrix  inside  a  feedback  loop  is 
slightly  more  complex  but  is  of  extreme  importance.  Consider  the 
"system"  of  A-3. 


X 

+  * 

■^ 

Zs 

'^ 

< 

/ 

A 

\ 

Fig.  A-3 

This  "sjrstem"  implies  the  equation  y_  =  x-Ay  or  [I  +  A]  y_  =  x.  Note 

-1 
that  the  output  jr  is  specified  uniquely  by  x  only  if  [I  +  A]   exists. 

—1 
Thus,  the  diagram  of  Fig.  A-3  is  valid  only  if  [I  +  A]"  exists,  since 

only  then  is  the  output  uniquely  specified  by  the  input  and  in  our 

basic  concept  of  a  system  we  require  this. 

In  order  for  a  block  diagram  to  represent  a  valid  system  the 
output  of  each  individual  element  of  the  block  diagram  must  be  uniquely 
specified  for  each  possible  input.  The  block  diagram  manipulations  of 
Fig.  A-2  may  then  be  performed  without  affecting  this  uniqueness  to 
obtain  equivalent  systems. 

Some  examples  will  now  be  given  to  show  how  the  ability  to 
picture  a  matrix  operation  in  the  form  of  a  block  diagram  is  of  great 
assistance.  Important  relations  are  obtained  in  Example  3° 


Example  1 .   Show  that 


1.-1 


and 


[I  +  £B]~  =  [  A"1  +  B]~1A 


[I  +  ABf 1  =  B~1[B~1  +  A]~ 1 


if  A°  exists 


-1 
if  B   exists 
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We  use  the  visual  aid  of  the  block  diagram  of  Fig»  A-^,  although  a 
straightforward  algebraic  proof  is  simple  for  this  simple  problem,, 

+ 


■O- 


A  f 


B 


>9— >  A 


B  * 


%> 


A  ( —  B  ^ 


*? — >  B 


>E 


-1 


Fig.  A-4 

Note  that  in  moving  blocks  around  the  loop  it  is  clear  when  the 
additional  assumption  must  be  made  as  to  the  existence  of  some  inverse. 

Example  2.  Consider  the  question ,  "Does  the  existence  of  [  I  +  AB]" 

imply  the  existence  of  [i  +  BA]°'  ?"  If  so,  determine  the  latter  in 

terms  of  the  former.  We  note  that  from  Fig.  A-5  [i  +  AB]"  A  =  A  [I  +  BA]~1  , 

so  the  answer  to  the  question  is  yes. 
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Fig.  A-5 


To  obtain  an  expression  for  the  inverse  we  introduce  the  quantity 

w  =  x-By.  Then,  by  the  definition  of  %  in  the  diagram  on  the  left  side 

of  Fig.  A-5, 

£  =  [I  -  £  [I  +  AB]~1A]  x 

but,  by  the  diagram  on  the  right  side  of  Fig.  A-5, 


w  =  [I  +  BA]~1x 


141 


for  all  x.  Thus 


[l  +  M]"1  =  I  -  5  [I  +  AB]~1  A 


(A.1) 


which  may  be  verified  by  direct  multiplication, 


Example  3<>  'As  a  third  example  we  shall  develop  two  matrix  identities 

of  practical  importance.  Suppose  R~  exists  and  consider  the  expression 

—1 
PH'  [  HPH'  +  R]~  ,  In  Figo  A=6  we  simply  back  things  around  the  loop  to 

obtain  the  first  desired  result  by  inspection. 


PH'  [  HPH«  +  R]~1  =  [I  +  PH'R"1H]"1   PH'R 


-1 


(A-2) 


+ 

V_                        n-1 

-*     -*  r1-^QHl}f^ 

> 

> 

1 

? £ 

-*  H1    ~>    P 

(T 

s 

L-  H   e-  P    *—  h»   «"" 

- 

H^ 

H»  ^-R°1  <— 

H    *— 

Fig,  A=6 


Using   (A, 2)  we  note  that 


I  -  fH'    [HPH1   +  R]-1   H     =     I     -    [I  +  PH!R~1H]~1    PH'R~1H 
Rewriting  I  as  a  products  this  relation  is  equal  to 

[I  +  PH'R"1H]"1     [I+PH'R"1H]     -    [I  +  PH'R"1H]"     PH'R~1H 


Canceling  terms,  we  obtain  the  identity 


I  -  PHfHPH'  +  R]~1  H  =  [I  +  PH'R°1H] 


U-.-1 


(A.3) 


This  result  may  be  verified  by  direct  multiplication.  Note  that  (A.3) 


-1 


could  also  have  been  obtained  by  substituting  PH'  for  B  and  R~'H  for  A 
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in  (Ao1)  and  then  bringing  R~  inside  the  inverse. 

The  easy  way  in  which  the  results  (A„2)  and  (A. 3)  were  obtained 
using  the  graphic  aid  of  the  block  diagram  is  an  indication  of  the 
power  of  this  approach  both  for  obtaining  identities  and  proving  the 
existence  of  inverses. 
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APPENDIX  B 

THE  GENERALIZED  INVERSE  OF  A  MATRIX 

B1   Introduction 

The  concept  of  the  inverse  of  a  matrix  was  first  generalized  by 
Moore  \_ky t   48]  to  include  rectangular  and  singular  square  matrices. 
Later,  Penrose  [50,  51]  independently  introduced  the  same  basic  concept. 
Further  discussion  of  the  generalized  inverse  was  given  by  Greville 

[21,  22]  o 

Penrose  proves  that  for  any  matrix  A  there  is  a  unique  matrix 
A  satisfying  the  following  four  relations; 

AA#A  =  A  (B.1) 

(AA#)'  =  AA#  (B.2) 

A#AA#  =  A#  (B.3) 

(A#A)»  =  A#A  (B.4) 

and  proceeds  to  discuss  the  properties  of  the  generalized  inverse. 

Kalman  [29]  defines  a  pseudo-inverse  of  a  matrix  A  to  be  any 
matrix  A  satisfying  (B.1)  and  shows  how  such  a  pseudo-inverse  may  be 
used  in  linear  filtering  and  prediction  problems. 

This  appendix  is  intended  to  serve  as  an  introduction  to  the 
notions  of  generalized  inverses  and  pseudo-inverses.  We  discuss  their 
use  in  connection  with  the  solution  of  linear  algebraic  equations.  A 
geometrical  interpretation  of  the  generalized  inverse  in  terms  of  orth- 
ogonal projections  is  presented.  For  simplicity,  the  discussion  is 
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limited  to  real  matrices, 


B2   The  Equation  Ax  =  y 


Consider  the  set  of  linear  algebraic  equations; 
a11*1  +  '*  '  '  +a1nxn  =  yi 


am1x1  +  °  "  *  +  Wn  =  ^  <B-5> 

or,  equivalently,  the  matrix  equation  Ax  =  £,  where  A  is  a  known  m  by  n 

matrix,  y_  is  a  known  m- vector,  and  x  is  an  n- vector  to  be  determined. 

If  there  is  no  vector  x  which  satisfies  this  relation,  the  set 

(B.5)  is  said  to  be  inconsistent 0  Otherwise,  the  set  is  consistent  and 

there  may  be  a  unique  solution  or  an  infinite  number  of  solutions. 

Suppose  a  solution  exists  and  we  decide  to  choose  the  smallest 

2 
solution;   that  is,  the  solution  for  which  |[x  ||  or,  equivalently,  -§-  |jx  |    , 

is  minimum „  If  we  introduce  a  vector  of  Lagrange  multipliers  and  treat 

(B«,5)  as  a  constraint,  we  may  minimize 

i  x'x  +  V  (£  -  Ax) 

Setting  the  partial  derivatives  of  this  expression  with  respect  to  x 
and  A  equal  to  zero  yields  the  following  two  equations; 

x  =  A'A  (B.6) 

Z  =  Ax  (B.7) 

Combining  these  equations,  we  obtain 

1  =  AA1  A  (B.8) 
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If  AA'  is  non-singular  we  may  solve  this  equation  for  _A  and  combine 
with  (Bo 6)  to  obtain  the  desired  solution; 

x  =  A»(AA»r1Z  (B.9) 

One  can  easily  verify  that  if  A' (AA1 )"  exists,  then  it  satisfies  (B»1) 
through  (B.4)  and  hence,  is  the  generalized  inverse. 

Suppose,  on  the  other  hand,  that  the  set  (B.5)  is  inconsistent. 

Then  one  might  wish  to  make  a  least  squares  approximation  by  minimizing 

2 
||  Ax  -  vj|  o  If  we  set  the  derivative  of  this  expression  equal  to  zero, 

we  obtain 

A'Ax  =  A»j  (B.10) 

If  a!a  is  non-singular  we  may  solve  this  expression  for  x  to  obtain  the 
desired  result. 

£  =  (A'ArVy.  (B.11) 

If  (A' A)™  A*  esists,  it  satisfies  (B.1)  through  (B„^)  and  hence,  is  the 
generalized  inverse. 

B?   The  Pseudo-Inverse 

A  pseudo-inverse  of  a  matrix  A  is  defined  [29]  as  any  matrix  A* 
satisfying  the  relation 

M+£  =  A  (B.12) 

It  follows  from  (B.12)  that  (A' )+  =  (A+)f  and  that  if  A°1  exists,  it 
is  equal  to  At. 

The  following  procedure,  similar  to  that  given  in  [29]  ,  may  be 
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used  to  obtain  a  pseudo-inverse  for  any  matrix.  We  first  consider  the 
case  in  which  A  is  square.  It  is  well  known  [17]  that  any  square  matrix 
may  be  reduced  to  a  diagonal  canonical  form  by  a  similarity  transformation, 

PAQ  =  E 

where  P  and  £  are  non-singular  and  E  is  a  diagonal  matrix  having  only 
zeros  and ones  on  the  diagonal 0     Then, 

PAQPAQ  =  E2  =  E 

and 

AQPA  =  P~1E£-1  =  A 

Hence,  £P  satisfies  (B„12)  and  is  a  pseudo-inverse  of  A^  Since 

1      1 
£   and  P   exist,  a  non-singular  pseudo-inverse  may  always  be  found 

for  a  square  matrix. 

To  obtain  a  pseudo-inverse  of  a  non-square  matrix  we  make  use 

of  the  fact  that  either  of  the  following  relations  imply  that  B  is  a 

pseudo-inverse  of  A5 

(ABA  -  A)  (ABA  -  A) •  =  0  (B.13) 

(ABA  -  A)'  (ABA  -  A)  =  £  (B.14) 

Direct  substitution  into  (BJ3)  and  (B.14)  respectively  will  verify  that 
the  following  expressions  are  pseudo-inverses  of  A; 

A'(AA')1"  (B.15) 

(A'A)+A«  (B.16) 
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The  pseudo-inverse  in  (BJ5)  and  (BC16)  may  be  computed  by  the  first 
method  since  (A'A)  and  (AA* )  are  square0 

Since  (B,15)  and  (B016)  are,  in  general,  not  the  same,  we  see  that 
the  pseudo-inverse  is  not,  in  general,  unique.  In  fact,  if  B  is  a  pseudo- 
inverse  of  A,  then  so  are 

B  +  (I  -  BA)M  and  B  +  M(I  -  AB) 

where  M  is  an  arbitrary  matrix. 

The  pseudo-inverse  may  be  used  in  connection  with  the  solution 
of  equations  such  as  (Bc5)  because  it  has  the  property  that  any  pseudo- 
inverse  of  A  may  be  used  to  test  for  the  existence  of  a  solution  and 
any  pseudo-inverse  of  A  may  be  used  to  obtain  a  solution  when  one  exists. 
This  property  is  summarized  in  the  following  theorem  which  is  similar 
to  one  proved  by  Penrose  for  the  generalized  inverse. 

Theorem;  The  equation  £  =   Ax,  where  y_  and  A  are  known,  has  a  solution 
if  and  only  if,  for  any  A  satisfying  (B,12),  the  following  relation 
is  satisfied; 

If  (B.17)  is  satisfied  for  some  pseudo-inverse  A',  then  it  is  satisfied 
for  every  pseudo-inverse  of  A  and  x  =  A*^  is  a  solution  of  (B,5)  for  every 
pseudo-inverse  of  A, 


Proof;  i,„(if)0  Let  A'  be  any  pseudo-inverse  for  which  (BJ7)  holds, 
Then  A |  y_  is  a  solution  of  (B,5)„ 

ii„,(only  if).  Suppose  x^   is  a  solution  of  (B,5)»  Then, 

£  =  Mi  =  AA'Ax  =  AA'y_ 
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for  every  pseudo«inverse  of  A. 

iiioo  Let  A"'"  be  any  pseudo-inverse  for  which  (B,17)  is  satisfied  and 
— o 

let  aJ  be  any  pseudo-inverse  for  which  (B.17)  is  not  satisfied.  Then, 

Zf  AA+£  =  AA+AAj^  =  mJ  z 

which  is  a  contradiction0  Hence,  if  (B017)  holds  for  some  pseudo-  in- 
verse, then  it  holds  for  every  pseudo-inverse  and  A'£  is  a  solution  of 
(Bo 5)  for  every  pseudo-inverse  of  A8 

Let  us  return  to  the  problem  of  finding  the  solution  of  (B„5) 
having  the  smallest  norm.  We  are  faced  with  the  problem  of  determining 
a  Lagrange  multiplier  X.  which  satisfies  (B.8)c  Such  a  Lagrange  multi- 
plier will  exist  by  virtue  of  the  theorem,  if  and  only  if, 

I^M^AA')^  (B.18) 

Noting  that  A" (AA5 ) '  is  a  pseudo-inverse  of  A,  it  follows  from  the  theorem 
that  (Bo  18)  will  be  satisfied  and  (B,8)  will  always  have  a  solution  if 
(Be 5)  has  a  solution,  which  was  the  original  hypothesis.  Hence,  we  may 
use  any  (AA' )*  to  obtain 

and  finally,  using  (B„6),  the  smallest  solution  of  (Bo5)  is  given  by 

x^AUAA')1^  (Bo19) 

This  result  differs  from  (B„9)  in  that  we  did  not  have  to  assume 

=1  + 

the  existence  of  (AA")  ,  It  is  interesting  to  observe  that  A« (AA' ) 

satisfies  (Be3)  and  (B»  as  well  as  (B.1), 

If  we  return  to  the  problem  of  finding  the  least  squares  approx- 
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imation,  we  are  faced  with  finding  an  x  which  satisfies  (B.10).  Such 
an  x  will  always  exist  and  we  may  use  any  (A'A)"*  to  obtain 

x  =  (A'A)+Af£  (B.20) 

+ 

which  is  similar  to  (B.10).  We  observe  that  (A'A)  A1  satisfies  (B.2) 

and  (B,3)  as  well  as  (B.1). 
B4   The  Generalized  Inverse 

The  generalized  inverse  of  a  matrix  A  is  the  unique  matrix  sat- 
isfying (B.1)  through  (B„4)„  In  the  previous  section  we  observed  that 
A'CAA')"*"  and  (A'A) 'A'  each  satisfied  three  out  of  the  four  of  these  re- 
lations. Hence,  it  is  not  surprising  that  the  generalized  inverse  may 
be  written  in  forms  resembling  these  expressions. 

We  first  consider  the  case  in  which  A  is  symmetric.  Then  A  may 
be  reduced  by  an  orthogonal  transformation  to  a  diagonal  matrix  having 
the  characteristic  roots  of  A  on  the  diagonal. 

i'AT  =  A_  =  diag(A1  ,eo.,\n) 

M. 

We  form  A  from  A  by  replacing  \^   by  1/^,  for  \.   f   o,  and  leaving 
unchanged  the  terms  of  A  which  are  equal  to  zero.  Then, 

A#  =  T  _A_#T5  (B.21 ) 

A 

It  is  simple  to  verify  that  AF  given  by  (B.21)  satisfies  (B.1)  through 
(Be**).  Since  AA'  and  A'A  are  symmetric,  we  may  find  (AA')  and  (A'A)ff 
in  this  manner.  One  can  easily  verify  that  the  following  two  expressions 
satisfy  (B.1)  through  (B.4)  and  hence,  are  equivalent  expressions  for  the 
generalized  inverse  of  A; 
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A»(AA«)  (B.22) 

(A«A)#A»  (B.23) 

It  is  clear  that  the  generalized  inverse  is  a  particular  pseudo- 
inverse  and  hence,  may  be  used  to  obtain  solutions  to  linear  equations 
when  solutions  exist.  The  generalized  inverse,  in  addition,  has  the 
two  important  properties  of  always  giving  the  smallest  solution  when  a 
solution  exists  and  giving  the  best  least  squares  approximation  when  no 
solution  exists.  These  properties  are  summarized  in  the  following  theorem 
given  by  Penrose  [51 ]  which  we  state  without  proof. 

Theorem;  Let  ||  A  ||  =  trace  A' A.  ^  is  said  to  be  the  best  approximate 
solution  to  the  matrix  equation  AX  =  Y,  if,  for  all  X,  either 

I  AX  -  Z  1  >  II  AXo  -  2  II 

or  ||  AX  -  I  II     =     II  AXc  -  *  I  «*  II I   1    -     II  So  II 

Then  A*Y  is  the  unique  best  approximate  solution  to  the  equation  AX  =  Y„ 


An  alternate  expression  for  the  generalized  inverse  may  be  obtained 

in  the  following  manner.  Reconsider  the  problem  of  finding  a  best  least 

squares  approximation  to  the  solution  of  the  equation  Ax  =  y_.  We  found 

previously  that  any  x  satisfying  A  "Ax  =  A'y_  would  do,  but  we  shall  now 

seek  the  smallest  x  satisfying  this  relation.  We  minimize  the  expression 

2 
£||£  ||   +  ^(A'Ax  -  A'vJ  (B.24) 

Setting  the  partial  derivatives  of  this  expression,  with  respect  to  x 
and  A,  equal  to  zero  we  obtain  the  following  two  equations; 
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A!Ax  =  A5y_ 

£  =  A'A  \ 
These  may  be  combined  to  obtain 

A'AA'A  A,  =  A»£ 
Using  a  pseudo~ inverse  to  solve  for  ),9   we  obtain 

X  =  (A8AA»A)+A»£ 
Combining  (B.25)  and  (B.26)  yields 

2  =  A,A(AlAA»A)tA,£ 

The  matrix  A'ACA'AA'A)"^'  may  be  shown  to  satisfy  (B.1)  through  (Bo2*) 

A 
and  hence,  is  an  expression  for  A  0  In  verifying  this  fact  it  is  help- 
ful to  note  that  A'ACA'AA'A)^  is  a  pseudo-inverse  of  A' A  by  (B.15)»  We 
omit  the  details „  Then, 

A#  =  A'ACA'AA'A)"^1 


(B.10) 
(B.25) 


(B„26) 


(Bo27) 


(B„28) 


There  is  also  a  related  expression  for  Ar  given  by 

A#  "  A1  (AA»  AA°  )tAA8 

Suggestions  for  computing  the  generalized  inverse  are  given  in  [6,  22, 

51]  o 


(B„29) 


In  order  to  give  the  generalized  inverse  a  simple  interpretation 
in  terms  of  orthogonal  projections  we  will  need  some  basic  concepts  about 
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linear  transformations  and  vector  spaces  such  as  may  be  found  in  [2]  . 
We  will  view  the  m  by  n  matrix  A  as  a  set  of  column  vectors 
a*  .....ajj  ,  If  the  equation  %  =   Ax  is  written  in  the  following  form, 
emphasizing  this  point  of  view, 


£1   £2 

i     I 


I " 

x1 

a 

000                 tY\ 

o 

1 

0 

0 

*n 

1 


it  is  clear  that  the  equation  will  have  a  solution  if  and  only  if,  y_ 
is  a  linear  combination  of  the  column  vectors  of  A;  in  which  case,  v_ 
is  said  to  belong  to  the  range  of  A,  y_£R(A).  In  any  case,  we  may  de- 
compose y_  into  the  sum  jr  =  y.  +  v  ,  where  y*  e  R(A)  and  v  is  orthogonal 
to  the  column  vectors  of  A„  A  vector  y_  with  this  property  is  said  to 
belong  to  the  null  space  of  A1,  v  £  N(A'),  since  A1^  =  o„  R(A)  and  N(A') 
are  called  orthogonal  complementary  subspaces  of  m  dimensional  vector 
space,  E01  —  orthogonal,  since  any  member  of  one  is  orthogonal  to  any 
member  of  the  other  —  complementary,  since  their  union  is  E111  —  subspaces, 
since  any  linear  combination  of  vectors  belonging  to  R(A)  (or  N(A' ))  also 
belongs  to  R(A)  (or  N(A'))„  The  intersection  of  R(A)  and  N(A!)  is  the 
zero  vector. 

Suppose  that  S  and  T  are  orthogonal  complementary  subspaces  of 
ET.  Consider  a  transformation  P  on  Er   such  that  if  w  £  JS131  and  w  =  u  +  v, 
where  u^S  and  v  £  T,  then 

Pw  =  u 

Such  a  transformation  P  is  called  the  orthogonal  projection  of  ET  onto  S. 
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Necessary  and  sufficient  conditions  that  P  be  an  orthogonal  pro- 
jection are 


p2  =  p 


P»  =  p 


We  use  the  notation  JL/.\  to  signify  the  orthogonal  projection  onto  R(A). 
Let  us  turn  our  attention  to  the  diagram  of  Fig.  B-1  „  This  figure 
should  be  interpreted  in  the  following  manner;  any  point  in  the  lower 
plane  (ET)  is  mapped  in  the  direction  of  the  arrow  by  the  transformation 
A  to  a  point  in  R(A)  in  the  upper  plane  (E^). 


E"1 


N(A') 


R(A') 


Fig.  B-1 


.15^- 


Consider  the  equation  Ax  =  y_,  where  y_  is  shown  in  Fig.  B-1 . 
This  equation  has  no  solution  since  ^4R(A).  The  point  nearest  to  £ 
for  which  this  equation  has  a  solution  is  y. ,  where  ^  =  P   .y_.  If  we 
now  consider  instead  the  equation  Ax  =  y*  ,  we  see  that  any  point  x  on 
the  line  of  Fig.  B-1  marked  Ax  =  y,  is  a  solution  of  this  equation. 
From  these  solutions  we  choose  x. ,  the  one  closest  to  the  origin.  Note 

JL 

that  x.  e  R(A').  We  would  like  to  have  a  transformation  A  which  would 
perform  these  operations.  Then  x.  would  be  given  by 

*!  =  £#y.  (B.30) 

Let  us  consider  the  properties  such  an  A"  would  possess.  First, 
R(A  )C  R(A')  since  every  point  y_  would  be  mapped  by  Ay  into  a  point 
belonging  to  R(A').  Second,  if  x.  is  to  be  given  by  (B.30),  we  would 
require  Axj  to  be  PR/.\£.  That  is,  for  all  y_ 

*A  =  £R(a/ 


or 

M#  =  £R(A)  (B'31> 

# 

Third,  if  Ax  =  y, ,  then  A  y^  would  be  PR,.,vX„  That  is,  for  all  x 


or 


A^=Pr(ai)  (B.32) 
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The  condition  (B„32)  implies  that  R(A')cR(A#)  since,  if  X6R(A'), 
then  A*  Ax  =  x  and  x  eR(A^).  We  now  have  the  relation"  R( A*) c  R(A" )  and 
R(A')CR(F)  which  implies  R(A»)  =  R(A*)o  Thus,  (B032)  becomes 

A*  A  =  P 

~R(A#)  (B.33) 

The  condition  (B„31 )  is  equivalent  to  (B„1 )  and  (B.2),  and  the  condition 
(Be33)  is  equivalent  to  (B„3)  and  (B„4)0  Hence,  all  the  properties  of 
the  generalized  inverse  follow  from  (Bc31 )  and  (B„33)  and  are  easily- 
given  a  geometric  interpretation  as  is  done  in  Fig.  B-1 « 

B6   Further  Discussion  of  the  Equation  Ax  =  y 

In  this  section  we  shall  prove  three  simple  lemmas  which  will  be 
used  repeatedly  in  Chapter  IV,  We  again  consider  the  linear  equation 

Ax  =  I  (Bo3^) 

where  y_  and  A  are  known  and  x  is  to  be  determined  0 

B6q1   Lemmas 

The  solution  of  (B03^)  is  unique  if  and  only  if  A'A  is  positive 
definite.  If  A'A  is  positive  definite  the  unique  solution  is 

£  =  [A»A]-1A«£  (B,35) 

Proof;   Let  x.  and  x2  be  any  two  solutions  of  (B03^)0  Then 

£  =  A  [^  -  x2] 
and 
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Ui  -%]  A»A  [^  -ay  =  °  <B»36) 

If  A1 A  is  positive  definite  (B,36)  implies  x,  =  Xg  and  the  solution  of 
(B.34-)  is  unique.  If  A1  A  is  not  positive  definite,  let  x  be  any  non- 
zero  vector  such  that  x^  A'Ax^  =  o,  (i.e.,  x.  6  N(A))„  If  x1  is  a  solution 

of  (Bo3^)  then  so  is  x,  +  3c  and  the  solution  of  (B.3^)  is  not  unique, 

—1 
The  solution  is  given  by  (B,35)  since  [A' A]  A1  is  the  generalized  inverse 

of  A, 

B6,2   Lemma; 

A  solution  of  (B.3^0  will  exist  if  and  only  if  y_  e  R(AA' );  in  which 
case  the  solution  of  smallest  Euclidian  norm  is  given  by 

x=A»(AA«)#£  (B.37) 

Proof;  Using  the  theorem  of  section  B3  and  recalling  that  A' (AA1 )*  =  A#, 
we  observe  that  (B,3^)  will  have  a  solution  if  and  only  if 

I  =  AA#y.  =  AA'  (AA'  )#£ 

Hence ,  (B,3^)  will  have  a  solution  only  if  £  £  R(AA'),  Moreover ,  by  the 
same  theorem,  £  =  AA'A  has  a  solution,  (i.e.,  y_  g  R(AA'),  if  and  only  if 

1  =  AA'  (M1  )*Z 
The  solution  of  smallest  Euclidian  norm  is  given  by  (B.37)  since  A' (AA* )#  =  A*. 
B6o3   Lemma; 

A  solution  of  (B.3^)  will  exist  for  all  y_  if  and  only  if  AA1  is 
positive  definite;  in  which  case  the  solution  of  smallest  Euclidian  norm 
is  given  by 
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x  =  A'(AAT1Z 

Proof;  A  solution  of  (B.32^)  will  exist  for  all  £  if  and  only  if  all 
y.  £  R(A),  or  equivalently  if  and  only  if  N(Af)  =  o.  But  a  vector  v 
belongs  to  N(A" )  if  and  only  if  y  M'Zq  =  Oo  Bnis,  N(A')  =  o  if  and 
only  if  AA'  is  positive  definite. 
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APPENDIX  C 


GAUSSIAN  RANDOM  VECTORS 


In  this  appendix  we  summarize  some  known  facts  about  Gaussian 
random  vectors  or,  equivalently,  about  the  multivariate  normal  dis- 
tribution,, Special  attention  is  given  to  situations  in  which  the 
covariance  matrix  is  singular.  The  pseudo-inverse  and  generalized 
inverse  of  Appendix  B  are  shown  to  be  applicable  to  these  situations. 

If  x  is  an  n-dimensional  random  vector  with  mean  m  and  covariance 
matrix  V  =  E  [ (x-m) (x-m) ■ ]  ,  we  shall  say  that  x  is  a  Gaussian  random 
vector  if  its  characteristic  function  is  of  the  following  form; 

Cx(iw)  =  E  [exp  iw'x  ]  =    exp  iw'm  *^  i  w'Vw  (C.1) 

If  V  is  non-singular  (positive  definite)  the  probability  density  function 
for  x  takes  the  following  form; 

1  r  2       -, 

f(x)     =  n/2  T~  exp       °  i||  x-m  ||      , 

(2Tf)n/2     |V|  2  L         »--  V1J  (C.2) 

The  definition  of  a  Gaussian  random  vector  in  terms  of  the 
characteristic  function  has  been  suggested  [29]  in  order  to  avoid  the 
difficulties  which  arise  when  V  is  singular.  The  results  to  be  presented 
in  this  appendix  have  been  derived  elsewhere  [29]  using  a  different 
approach  based  on  the  characteristic  function.  We  shall  use  an  approach 
which  is  more  in  keeping  with  the  body  of  this  work. 

A  vector  u  will  be  called  a  normalized  Gaussian  random  vector  if 
it  is  Gaussian  with  mean  E(u)  =0  and  covariance  matrix  E(uu')  =  I.  Any 
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Gaussian  random  vector  x  can  be  considered  as  having  been  obtained  from 
a  normalized  Gaussian  random  vector  of  the  same  dimension  by  a  linear 
transformation  of  the  following  form; 

x~m  =  Tu 

Then, 

E(x)  =  m  +  T  E(u)  =  m 

E  [(x-m)(x-m!)]  =   TE(uu»)T'  =  TT»  =  1 

It  is  shown  in  Appendix  A  that  a  non-negative  definite  matrix  V  may 
always  be  factored  into  a  product  TT1,,  It  will  suffice  for  our  purpose 
to  know  that  the  transformation  T  exists ,  since  we  shall  never  have  to 
compute  T« 

Let  x-i  and  Xp  be  Gaussian  random  vectors  having  the  joint  distri- 
bution specified  by 


x  = 


£1 


% 


m  = 


— 1 


S2 


V  = 


lu     212 

V    V 
^21  -22 


where  V  may  be  singular 0 

We  consider  x  to  be  the  result  of  a  transformation  on  a  norm- 
alized Gaussian  vector  u0  Then, 


x-m  =  Tu 


»  IT  =  1 


(C3) 


Let  H  be  the  matrix  [0  I]  so  that  x?  =  Hx„  Suppose  that  an  experiment 
is  performed  in  which  it  is  learned  that  the  value  of  Xo  is  z.  Having 
learned  the  value  of  Xo  we  want  to  know  the  a  posteriori  distribution 
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for  x.  o  The  a  posteriori  distribution  for  x-j  will  be  Gaussian  and 
hence ,  will  be  specified  by  its  mean  and  its  covariance  matrix.  To 
determine  the  a  posteriori  mean  we  minimize 

2 

i  I  H  II  +  ±*  (z~Hm-HTu)  (C.4) 

That  is,  we  locate  the  mode  (or  mean)  of  the  a  posteriori  distribution 
for  u.  The  Lagrange  multiplier  has  been  introduced  to  insure  that  Xo 
takes  on  the  observed  value  z_c  Minimizing  (C.4)  is  exactly  the  same 
type  of  problem  as  was  discussed  in  Appendix  B.  Proceeding  as  before, 
we  set  the  partial  derivatives,  with  respect  to  u  and  Ji,  of  (C.^)  equal 
to  zero  to  obtain  the  following  two  equations  for  the  minimizing  value  u; 

HTu  =  z-Hm  (C.5) 

u  =  T'H»A  (C.6) 

Combining  (C„5)  and  (C.6)  we  obtain 

HTT'H'  \   =  z-Hm  (C.7) 

As  was  shown  in  Appendix  B,  a  \   satisfying  (C„7)  will  always  exist  if 
there  is  a  u  which  satisfies  (Cc5)c  There  will  be  a  u  which  satisfies 
(C.5)  unless  the  observed  value  of  Xg  has  probability  zero;  that  is, 
unless  the  singularity  of  the  a  priori  covariance  matrix  V  makes  the 
observed  value  inconsistent  with  prior  knowledge.  We  rule  out  this 
possibility  since  it  leads  to  division  by  zero  in  Bayes1  Theorem  and 
violates  our  basic  assumption.  Values  of  Xp  for  which  a  solution  to 
(C.5)  exists  will  be  called  consistent.  Substituting  for  H  and  TT' 
in  (C.7)  yields 
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122  k  =  2-m2  (C.8) 

Any  pseudo-inverse  may  be  used  to  solve  for  A..  Then, 

A  =  ^2  (2.-E2)  (Go9) 

Combining  (C.9)  and  (C.6)  gives  the  following  equation; 

u  =  T»H'V22(z-m2) 

Finally,  using  (C.3).  we  obtain 

x-ra  =  VH»v£2(z-m2)  (CJO) 

This  is  equivalent  to  the  two  equations 

^2  =  -2  +  l22l2Z^-"-2^  (C12) 

From  the  pseudo-inverse  theorem  of  Appendix  B  we  know  that  the  existence 
of  a  solution  of  (C„8)  implies  the  following  relation; 

2-2!2  =  Xg^Cz  -m2)  (C„  13) 

Combining  (C,12)  and  (C13)  leads  us  to  the  consistent  result  x2  =  z  . 
The  basic  result  is  (C011)  and  is  summarized  in  the  following  statement „ 

Statement  C-1 %     The  conditional  expectation  of  x. ,  given  a  consistent  x2 
is  given  by  the  following  relation; 

ll  =  3,  +V12V22(x2-m2) 

We  may  obtain  a  useful  relation  from  (C«13)  by  using  the  fact 
that  for  all  consistent  values  of  x2  the  following  relation  is  satisfied; 
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t 

£2~-2  =  -22-22  ^-2~-2^  (C,1^) 

Postmultiplying  both  sides  of  (C.14)  by  (x.-^)1  and  averaging  with  re- 
spect to  the  joint  a  priori  distribution  yields 

t 

V  =  V  V  V  (C.15) 
-21   -ZZ-Z2r-Z\  ^,I;; 

Taking  the  transpose  of  (C.15)  gives  the  following  useful  relation; 

V  =  V  V  V  (CJ6) 
Hi       112122122  K   °      J 

In  the  development  of  an  expression  for  the  covariance  matrix 
of  the  a  posteriori  distribution  for  x. ,  given  Xp»  the  concept  of  pre- 
posterior  analysis  [5^]  will  be  used.  Given  any  value  of  Xg  we  may  find 
the  a  posteriori  distribution  for  x.  which  will,  in  general,  depend  on 
the  value  of  x?e  Before  the  value  of  x2  ^s  known*  parameters  of  the  a 
posteriori  distribution  may  be  considered  to  be  random  variables,  Pre- 
posterior  analysis  is  concerned  with  the  a  priori  probability  of  obtaining 
particular  values  for  parameters  of  the  a  posteriori  distribution «,  We 
shall  need  the  following  simple  lemma; 
Lemma;  Let   E  [0(x..)]  be  the  expected  value  of  some  function  0(x.) 

with  respect  to  the  a  posteriori  distribution,,  Then  the  expected  value 


of   E  [0(x.)]  with  respect  to  the  a  priori  distribution  for  x?  is  the 

£ll£2 
a  priori  expected  value  of  0(x^)»  That  is, 


x. 


E  [0(x1)]l  =   E  [0^)] 


x4  x. 


-2  L -11-2 


Si 

The  proof  is  by  inspection  since  on  the  left  side  of  this  expression 
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the  average  is  with  respect  to  the  joint  a  priori  distribution  for  Xj 

and  Xpo 

Using  this  lemma  we  may  easily  prove  the  following  theorem  given 

in  |>]o 

Theorem;  The  expected  value  with  respect  to  x?  of  the  covariance 
matrix  of  the  a  posteriori  distribution  plus  the  covariance  matrix 
with  respect  to  x2  of  the  mean  of  the  a  posteriori  distribution  is .equal 
to  the  covariance  matrix  of  the  a  priori  distribution  for  x, „  That  is, 


E  J   E  teVl 


*~  teJ2L 


-1-1 


E     (5,) 


^i^ 


X.     X 

-1I_2 


-l'-2 


+     E 


E     (x 

-1  '-2 


E     (x, )  " 


x    x 

-1  '-2 


E 
*2 


E    (35,) 

x,  |x       ' 
-1  I-2 


E 

x 

1-2 


E     (£|) 


x    IX 
-l'~2 


=     E    [^x/]     -       E    [x^     [E    [x^ 


5| 


=-1 


^ 


The  proof  is  immediate  by  canceling  like  terms  and  applying  the  lemma e 
This  theorem  is  sometimes  stated  for  scalar  random  variables  in  the 
following  concise  form0  The  mean  of  the  posterior  variance  plus  the 
variance  of  the  posterior  mean  is  equal  to  the  prior  variance „ 

Statement  C°°2;  The  covariance  matrix  with  respect  to  x^  of  the  mean  of 
tne  a  posteriori  distribution  for  x^  is  V.  Jk  Jt>i  •  'Ihis  sta"teinen't 
follows  from  C~1  since 


E  [(^-^(Xi-a,)1]   = 


% 


L  j£ JL  v+  v 

-1 2-22-22-22-21 


Using  (C015)  or  (C«16)  concludes  the  proof, 
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Statement  C-3;     x.   -       E     (x  )  is  independent  of  x  . 

-1'-2 
To  show  this  we  compute  the  covariance  matrix  of  this  quantity 

and  x  o     From  Statement  C-1   we  have 
-2 


E    [(Xj   -       E       (x^))(x2-m2)»] 
— 1  '—2  ^-Z 

X|XIH-1        -12-22v~2  -2/J       v-2  -2'  J  -12       -*2-22-22 

-^1   —2 


This  quantity  is  equal  to  0  by  (C016)« 

This  statement  implies  that  the  covariance  matrix  of  the  a  poster- 
iori distribution  for  x.  is  independent  of  the  observed  value  of  x^  a^d 
hence,  the  expected  value  with  respect  to  x  of  this  quantity  is  equal 
to  the  quantity  itself.  Using  this  fact  and  the  theorem  leads  to  the 
following  statement. 

Statement  C-4;  The  covariance  matrix  of  the  a  posteriori  distribution 
forx,  given  %  is  V„  -  I^J*^. 

The  generalized  inverse  may,  of  course,  always  be  used  in  place 
of  the  pseudo- inverse.  The  use  of  the  pseudo-inverse  has  been  consistent 
with  our  effort  to  use  a  notation  compatible  with  [29]  .  In  practice  one 
might  prefer  to  use  the  unique  generalized  inverse. 
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APPENDIX  D 

GENERAL  SOLUTION  OF  THE  DISCRETE-TIME  LINEAR  ESTIMATION  PROBLEM 

D-1   Introduction 

This  appendix  is  devoted  to  a  study  of  a  very  general  form  of 
the  linear  discrete- time  estimation  problem  in  which  the  covariance 
matrices  may  be  singular.  For  the  sake  of  generality  and  at  the  sacrifice 
of  some  simplicity,  we  allow  correlation  between  the  random  input  w  and 
the  measurement  noise  v.  Necessary  background  material  appears  in  Chapter 
II  and  Appendices  B  and  C. 

D-2   Problem  Statement 

Consider  the  system 

x(k+1  )  =  F(k)x(k)  +  G(k)w(k)  (D„1  ) 

z(k)  =  H(k)x(k)  +  v(k)  (D.2) 

where  x  is  the  state  vector  and  z   is  the  observation.  {  w(k)],  the  random 
input  and  (v(k)j,  the  measurement  noise,  are  sequences  of  independent  ran- 
dom variables  with  zero  means  and  the  following  covariance  matrices; 

E  [w(j)  w«(k)]  =  £(k)  s^ 

E  fz(J)  ▼' 00]  =    £(k)     <£.k  3r  all  intea 

E  [w(j)  v»(k)]   =     S(k)    JJk 

Based  on  a  finite  observation  sequence  j  z(o),  ...,z_(n)j ,  we  seek  an 
estimate  of  the  sequence  of  states  {x(o)f ...,x(n4m)j . 
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Suppose  that,  given  the  observed  sequence  {  z^o),... ,z(k-1 )j, 
the  distribution  for  x(k)  has  mean  x(klk=>1)  and  covariance  matrix  P(k), 
Consider  the  vector 

x(k|k~1) 
H(k)x(klk-1) 

and  having  the  following  covariance  matrix; 


x(k)~ 

~*(k) 

= 

with  mean 

l(k) 

H(k)x(k)  +  v(k) 

P(k) 
I(k)P(k) 


P(k)H« (k) 
H(k)P(k)H'(k)  +  R(k) 


If  _z(k)  is  observed,  we  may  use  statements  (C=»1  )  and  (C-4)  from 

Appendix  C  to  obtain  the  following  expressions  for  the  mean  x(klk) 

and  covariance  matrix  C(k)  of  the  distribution  for  x(k)  given  {  z(o),.e.  9_z(k)J  | 

x(k|k)  =  x(k|k-1)  +  P(k)H»(k)  [H(k)P(k)H»(k)  +  R(k)J  + 


[z(k)-H(k)x(k|k~1)] 
C(k)  =  P(k)-P(k)H'(k)  [H(k)P(k)H»(k)  +  R(k)]f  H(k)P(k) 


(D.3) 


Similarly,  consider  the  vector 


x(k+1 ) 
z(k) 


F(k)x(k)  +  £(k)w(k) 
H(k)x(k)  +  v(k) 


with  mean 


and  having  the  following  covariance  matrix^ 

F(k)P(k)F« (k)  +  G(k)£(k)G» (k) 
H(k)P(k)F» (k)  +  S' (k)G' (k) 


F(k)x(k!k=1) 

H(k)x(k|k=1) 


F(k)P(k)H» (k)  +  G(k)S(k) 
H(k)P(k)H» (k)  +  R(k) 
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If  z(k)  is  observed  we  may  again  use  statements   (C-1  )  and  (C-A)  to 
obtain  the  following  expressions  for  the  mean  x(k+1 1 k)  and  covariance 
matrix  P(k+1 )  of  the  distribution  for  x(k+1 )  given  /  z^(o ),.<,<,  ,z(k)j  ; 

x(k  +1|k)  =  F(k)x(k|k~1)  +  f  F(k)P(k)H*  (k)  +  G(k)S(k)] 

[H(k)P(k)H»  (k)  +  R(k)]+[z(k)-H(k)x(k|k=1)]  (D.5) 

P(k+1 )  =  F(k)P(k)F» (k)  +  G(k)£(k)G« (k) 

-  [  £(k)P(k)H» (k)  +  Q(k)S(k)]     [H(k)P(k)H» (k)  +  R(k)]f 

[  H(k)P(k)F» (k)  +  S« (k)G« (k)]  (D.6) 

With  practically  no  effort  we  have  obtained  the  solution  to  the 
linear  filtering  problem  and  the  solution  to  the  problem  of  predicting 
one  step  ahead.  Equations  (DD5)  and  (D„6)  are,  except  for  slight  details, 
the  same  ones  as  those  given  in  [29]  . 

Note  that  x(k+1  |k)  is  not  simply  an  extrapolation  of  x(k|k)„ 
This  is  due  to  the  fact  that  information  about  w(k)  is  available  in 
z_(k)  because  w(k)  and  v(k)  are  correlated,,  However 9  x(k+2|k)  and  pre- 
dictions further  into  the  future  are  simply  extrapolations  of  x(k+1  I k) , 
since  no  information  is  available  about  future  values  of  w0  Since  it 
is  now  evident  how  to  make  predictions ,  we  may  confine  ourselves  to  the 
problem  of  estimating  [  x(o) , c „ „ ,x(n+1 )J  given  {  z (o) , . 0 0 ,z (n) j „ 

D^   Preliminary  Considerations 

Let  r(k)  =  G(k)w(k)„  and  consider  the  vector 


£(k) 
l(k) 


with  covariance  matrix 
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G(k)£(k)G8(k)   G(k)S(k) 
S»(k)G»(k)     R(k) 


We  consider  the  vector  formed  by  combining  r  and  v  to  have  been 
obtained  by  a  transformation  on  a  normalized  vector  u(k),  E[u(j)u'(k)]  =  I  <$".,« 


r(k) 

l(k) 


T(k)u(k),     T(k)T«(k) 


G(k)£(k)G'(k)   G(k)S(k) 
S» (k)G' (k)     R(k) 


Let  us  introduce  the  matrices  V  =  [0,1  ]  and  W  =  [1,0]  such  that 

r(k)  =  WT(k)u(k)  (D.7) 

v(k)  =  VT(k)u(k)  (D.8) 

WT(k)T«(k)W'  =  G(k)£(k)G!(k)  (D.9) 

VT(k)T'(k)V»  =  R(k)  (DJO) 

WT(k)T» (k)V'  =  G(k)S(k)  (D  J 1 ) 

The  £,  priori  distribution  for  the  initial  state  is  Gaussian  with 
mean  m  and  covariance  matrix  P(o)  which  may  be  singular „  We  shall,  con- 
sider the  initial  state  as  having  been  obtained  by  a  transformation  on  a 
normalized  vector  0 

x(o)  -  m  =  AG  AA1  =  P(o)        (DJ2) 

D5   Derivation  of  the  General  Solution 

Introducing  the  Lagrange  multiplier  vectors  jj_,  A(k)  and  ^(k)  „   we 
may  minimize  the  following  expression; 

In1    =|l|6    ||       +£'  [x(o)»m-Ae]    *       £   {i'00    [z(k)»H(k)x(k)^VT(k)u(k)] 
'  k=o 

2  .. 

+  illH(k)||      +    >(k)    [x(k+1)-F(k)x(k)-WT(k)u(k)]j  (D.13) 
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Setting  equal  to  zero  the  partial  derivatives  of  I_  - ,  with  respect  to  u(k), 
£(k),  2(k)  (for  k=o90.„,n);  x(k)  (for  k=o,...,n+1 );  and  e,  yields  the  follow- 
ing relations j 


0  =  A«e 


£  =  H°  (o)j;(o)  +  F«(o)i(o) 

u  =  T'(k)V»£(k)  +  T'(k)W»A(k) 


A(k-1 )  =  F' (k)A(k)  +  H» (k)i(k) 


X(n)  =  o 


x(o|n)  =  m  +  A8 


x(k+1 |n)  =  F(k)x(kln)  +  WT(k)u(k) 


K   I  j  e  a   e  j" 


k— o  1  •  o  9  9  n 


(D.14) 
(Do  15) 
(D.16) 
(D.17) 
(D.18) 

(P.19) 

(D.20) 
(D.21  ) 


z(k)  =  H(k)x(k|n)  +  VT(k)u(k) 
Combining  (Dd^),  (De15)  and  (DJ9)  and  observing  the  AA'  =  P(o)  yields 

x(o In)  =  n  +  P(o)H» (o)i(o)  +  P(o)F8 (o)A(o)  (D022) 

1 

Combining  (Dj6)  and  (D*20)  and  using  (D„9)  and  (D011),  we  obtain  the 
following  equation; 

x(k+1|n)  =  F(k)x(kln)  +  G(k)S(k)?(k)  +  G(k)£(k)G» (k)A(k)         (D„23) 

Using  (DJO),  (D.11)  and  (D,16),  we  rewrite  (D«21 )  in  the  following  form; 

z(k)  =  H(k)x(k|n)  +  R(k)£(k)  +  S» (k)G» (k)A(k)  (D.2*0 

We  now  have  a  two-point  boundary  value  problem  given  by  (Dc23)  and 

(D0 17)  and  the  constraint  (D„24)e  The  boundary  conditions  are  given  by 

(D.18)  and  (D.22). 
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In  order  to  simplify  future  equations  we  define  the  following 
quantities ; 

B(k)  =  [H(k)P(k)H«(k)  +  R(k)]#  (D.25) 

M(k)  =  B(k)  [  M(k)P(k)F'(k)  +  S'(k)G'(k)]  (D.26) 

We  begin  by  considering  the  boundary  condition  for  x(ofn).  Com- 
bining (D022)  and  (D.24)  yields 

z(o)  =  H(o)ra  +  [H(o)P(o)H«(o)  +  R(o)]  £(o) 

+  [H(o)P(o)F'(o)  +S»(o)G«(o)]  A(o)  (D027) 

t 

Using  the  generalized  inverse  to  solve  for  a  £(o)  yields 

£(o)=B(o)  [z(o)«H(o)m]  -  M(o)X(o)  (D.28) 

Note  that  any  pseudo~inverse  could  have  been  used  to  solve  for  ?(o)„ 
The  generalized  inverse  is  convenient  because  it  allows  us  to  define 
B(k)  uniquely.  Combining  (De28)  and  (D„22)  yields 

x(o|n)  =  m  +  P(o)H»(o)B(o)  [z(o)»H(o)m  ] 

+  [P(o)r(°)-P(°)H'(o)M(o)]  \(o)  (D029) 

or,  equivalently, 

x(oln)  =  x(olo)  +[P(o)F»(o)-P(o)H'(o)M(o)]  _X(°)  (D<>30) 

i 
It  is  reassuring  to  note  that  for  the  special  case  S  =  0,  (De30)  reduces 

to  the  following  familiar  equation  of  Chapter  II; 

x(o|n)  =x(olo)  +  C(o)F'(o)^(o)  ,S(o)  =0       (2.41) 
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We  now  hypothesize  that  the  general  solution  is  of  the  following 
form; 

x(k|n)  =  x(k|k)  +  [P(k)F'(k)  -  P(k)H' (k)M(k)]  \(k)  ,k=0,.„.,n  (D„31 ) 

A(k-1)  =[F»(k)-H'(k)M(k)]  J^(k)  +H»(k)B(k)[z(k)-H(k)x(k|k-1)l 

(D.32) 
where  x(klk),  x(k+ll  k)  and  P(k)  are  given  by  (D„3),  (D.5)  and  (D.6)„  The 
hypothesis  (D.32)  is  equivalent  to  requiring  that  P(k)  be  given  by  the 
following  equation; 

£(k)  =  B(k)  [z(k)-H(k)x(k|k-1)]  -  M(k)\(k)  (Dc33) 

We  have  already  shown  that  (Do31 )  and  (D„33)  are  satisfied  for 
k=o.  To  establish  a  proof  by  induction  we  need  to  show  that  if  (D,31 ) 
and  (D„33)  are  satisfied  for  some  k£  (o,.,.,n-l),  then  (D.31  )  and  (D.33) 
are  satisfied  for  k+1 . 

Suppose  that  (De31  )  and  (De33)  are  satisfied  for  some  k&  (o,...,n-1); 
then  from  (D023)  we  have  the  following  relation; 

x(k+1|n)  =  F(k)x(k|k)  +[F(k)P(k)F» (k)  -  F(k)P(k)H« (k)M(k) 

+  G(k)2(k)G»(k)  -  G(k)S(k)M(k)  ]  \(k) 

+  G(k)S(k)B(k)  [  z(k)  -  H(k)x(k.lk-1)]  (D.3^) 

Using  (D.3)9  (D05)  and  (De6),  we  may  rewrite  (D.34)  in  the  follow- 
ing form; 

x(k+1  In)  =  x(k+1  Ik)  +  P(k+1 )  \_(k)  (De35) 

Substituting  for  \(k)  from  (DJ7)  yields 
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x(k+1 1  n)  =  x(k+1 1  k)  +  P(k+1  )H'  (k+1  )£(k+1 )  +  P(k+1  )F«  (k+1  )^(k+1 )   (D.36) 

From  (D.24)  we  may  write 

z  (k+1 )  =  H(k+1  )x(k+1 1  k)  +  [  H(k+1  )P(k+1  )H'  (k+1  )  +  R(k+1 )]  £(k+1  ) 

+  [  H(k+1  )P(k+1  )F •  (k+1 )  +  S '  (k+1  )G '  (k+1 )]    \(k+1 )  (D.37) 

Using  the  generalized  inverse  to  solve  for  a  g_(k+1 )  we  obtain  the 
following  relation  which  satisfies  the  hypothesis,    (D„37)» 

£(k+1 )  =  B(k+1  )  [  z  (k+1  )-H(k+1  )x(k+1  I  k)]   -  M(k+1  )^(k+1 ) 

Combining  (D038)  and  (D.36)  and  using  (D„3)»  we  obtain  the  following 
equation  which  satisfies  the  hypothesis  (D.31 )  and  completes  the  proof; 

x(k+1 1  n)  =  x(k+1  I  k+1 )  +  [  P(k+1  )F«  (k+1  )-P(k+1  )H»  (k+1  )M(k+1 )]  A(k+1  ) 

(D.39) 
We  may  summarize  these  results  in  the  following  theorem; 

Theorem;  The  general  solution  of  the  linear  estimation  problem  as 
specified  by  the  two-point  boundary  value  problem  (D023),  (D„17),  (D„24)9 
(D0 18)  and  (D022)  is  given  by  the  following  recurrence  relations; 

x(k+1lk)  =  F(k)x(klk«1)  +  M» (k)  [  z(k)-H(k)x(k|k-1 )]  (D„40) 

P(k+1 )  =  F(k)P(k)F!  (k)  +  G(k)£(k)G'  (k)-M'  (k)B#(k)M(k)  (DM  ) 

x(k|k)  =  x(k|k-1)  +  P(k)H'(k)B(k)[  z(k)-H(k)x(klk-1 )]  (D.42) 

x(k|n)  =x(k|k)  +[P(k)F'(k)-P(k)HJ(k)M(k)]  X(k)    k=o,,..,n  (D.43) 

X(k-1)  =[F'(k)-H'(k)M(k)]  \(k)   +  H'(k)B(k)[z(k)-H(k)x(klk-1)]  (D.44) 
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where  B(k)  and  M(k)  are  defined  in  (D.25)  and  (D.26).  The  recursion 
begins  by  replacing  x(klk-l)  for  k=o  by  m  in  (D.^1 )  and  (D.^2).  Pre- 
dictions are  extrapolations  of  x(n+1ln);  for  example, 

x(n+2ln)  =  F(n+1  )x(n+1  |n) 

x(n+3ln)  =  F(n+2)x(n+2|n)  (D.45) 

For  the  special  case  S(k)  =  0  for  all  k  (D.43)  becomes 

x(k|n)  =  x(klk)  +  C(k)F« (k)\(k)  (D.46) 

where  C(k)  is  given  by  (D.4),and  (D.44)  beomes 

^(k«l)  =  {  I-H'(k)  [  H(k)P(k)H' (k)  +  R(k)]#H(k)P(k)  ]  F»(k)2(k) 

+  H'(k)  [  H(k)P(k)H» (k)  +  R(k)]#  [  z (k)-H(k)x(k| k-1 )] 

(D>7) 
Discussion;  Equations  (D.40),  (D.^1 )  and  (D.42)  are  simply  other  forms 
of  (D„5)i,  (D.6)  and  (D<>3)  respectively.  The  generalized  inverse  may  be 
replaced  by  any  pseudo-inverse  if  (D.6)  is  used  in  place  of  (D.41). 

The  calculation  procedure  is  to  use  (D.40),  (D0*H  )  and  (D,42)  to 
obtain  x(k+1lk)  and  x(klk)  for  k=o,«,«,.,n.  Then  calculate  A_(k)  by  back- 
ward recursion  of  (0.44).  Once  _A(k)  is  obtained,  (D.43)  may  be  used  to 
obtain  x(k|  n)  for  all  k  £  (o9..„,n)0 

If  the  measurement  noise  is  not  independent,  i.e.,  E[v(j)v' (k)]  ^  ^^(k) 
we  may  consider  this  noise  to  be  the  output  of  a  linear  system  excited 
by  white  noise  and  adjoin  the  states  of  this  fictitious  linear  system  to 
the  basic  process.  There  is  no  need  to  assume  that  F(k)  is  non-singular. 

We  can  express  the  solution  of  the  linear  smoothing  problem  in  a 
simpler  form  by  writing  the  expressions  in  terms  of  x(k|k-1).  Com- 
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bining  (D„42)  and  (D„^3)  yields 

x(kln)  =  x(klk-l)  +  P(k)H* (k)B(k)  [  z(k)-H(k)x(k|k-1 )] 

+  P(k)  [  F'(k)-H»(k)M(k)]  >(k) 

This  expression  may  be  combined  with  (D«>^)  to  obtain  the  following 
equation;  which  is  a  restatement  of  (Dc35); 

x(kln)  =  x(klk-1  )  +  P(k)>(k«1 )  (D048) 

We  may  summarize  this  result  in  the  following  corollary j 

Corollary;   The  solution  of  the  linear  smoothing  problem  is  given  by 
(D0^8),  (D>0),  (D„41)  and  (D044)6  (D.44)  is  now  valid  for  k=o,...,n0 
The  boundary  condition  ^(n)  =  o  remains  in  effect c  The  a  priori  mean 
m  is  to  be  used  for  x(o|-1)  in  these  expressions. 


,175= 


APPENDIX  E 
PROBABILITY  DENSITY  FUNCTIONALS  FOR  GAUSSIAN  RANDOM  PROCESSES 

In  this  appendix  we  shall  given  an  introductory  discussion  of 
probability  density  functionals.  Our  treatment  will  be  formal,  since 
a  rigorous  discussion  of  "white  noise"  and  probability  density  functionals 
is  beyond  the  scope  of  this  report.  The  concept  of  a  probability  density 
functional  was  introduced  by  Preston  [  53  J  and  was  extended  by  Thomas  and 
Zadeh  [62]  to  include  vector  Gaussian  processes.  For  a  more  abstract 
treatment  the  reader  is  referred  to  the  work  of  Parzen  [  72] . 

Consider  a  vector  random  process  [x(t)| ,  for  o  -  t  -  T,  consisting 
of  a  sequence  of  steps  occurring  at  intervals  T/N  apart.  Assume  that  the 
magnitude  of  the  steps  is  normally  distributed  with  zero  mean  and  the 
following  correlation  (covariance)  matrix; 


E  [x(iT/N)±'(jT/N)]  =  R(iT/N,jT/N) 


i,j  €  (1....N)   (E.1) 


A  sample  function  of  a  scalar  version  of  such  a  process  is  shown  in  Fig. 

E— 1  . 

5<t) 


T 

N 


2T    21 

N     N 


Fig.  E-1 
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The  probability  density  function  for  the  sequence  of  points 
{x(T/N)9x(2T/N),o.,,..>x(T)]  is 


4 


p[x(T/N),...,x(T)j   =  C"  exp  L   }    ^   \  x«  (iT/N)H(iT/N,jT/N)x(jT/N) 

i=1  jst 

(E.2) 

where  C  is  a  constant  of  proportionality  and  where  H  and  R  are  related 
by  the  following  expression; 

K 
V  R(iT/N,kT/N)H(kT/N,jT/N)  =  I  6 ±J  (E.3) 

Let  us  define  a  function,  G(iT/N, jT/n),  as  follows; 

G(iT/N,jT/N)  =  H(iT/N, jT/N)  [n/t]2       ;  (E.4) 

Then  (E02)  may  be  rewritten  in  the  following  form; 

p[x(T/N),eeo,x(T)]     =     c~'  exp    i  -  i     £     X]   *'  (iT/N)G(iT/H,  jT/N)x(jT/N)  [t/n]' 

i=1      j=1 

1  (E.5) 

We  may  also  rewrite  (E.3)  as 


N 
£]     R(iT/N,kT/N)G(kT/N,jT/N)     [T/N]     =     I     ^     [n/t] 


(E.6) 


If  we  take  the  limit  as  N  — »<»  the  resulting  process  (x(t)j  is  a 
Gaussian  random  process.  A  random  process  f  x(t)j  is  said  to  be  Gaussian 
if  the  joint  distribution  for  every  finite  set  of  points  {  x(t.  ),<>..  ,x(t)j 
is  normal „ 

In  the  limit  as  H->co  9  (Ee6)  becomes 
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J        US  ,t)G(t,t£)dt  =  I  S(S-%2),    (o^t^tp^T) 

o  (E.7) 

A  matrix  kernel  G(t,t2)  satisfying  (E„7)  is  said  to  be  inverse  to  the 
correlation  matrix  R(t-|  ,t). 

If  we  were  to  take  the  limit  of  (E„5)  as  N~ >°°  ,  the  constant 
of  proportionality  C  would  become  infinite c  To  avoid  this  difficulty 
we  define  the  probability  density  functional  as  a  ratio,  using  a  definition 
similar  to  that  given  by  Thomas  and  Zadeh  [62J  8  Let  zr  T1  be  a  vector  of 
curves o  Let  h  be  a  small  positive  scalar  and  let  h  be  a  vector,  each 
element  of  which  is  equal  to  h0  Then,  the  probability  density  functional 
of  the  process  {x(t)j  on  the  interval  Lo,t]  is  defined  as  follows; 


■x 


z 


=    lira    Pr  [  z(t)-h  <x(t)  -  z(t)+h,~  for  o  ^  t  ^  Tj 


*M]i      h_,0 


ft*  [-£  <  £(t)  -  £,  f or  o  ^  t  ^  Tl  (E.8) 

T      T 

=  exp  <-  |  f  f  z'(t1)G(t1,t?)z(t?)  dt1  dt? I 

[      o7    oJ        112    2       2j  (Eo9) 

Note  the  similarity  between  (E„5)  and  (E,9)e 

White  Gaussain  noise  is  defined  to  be  the  formal  limit  of  a 
Gaussian  process  as  the  values  of  (x(t) j  at  different  instants  of  time 

become  statistically  independent,.  In  this  case  R(t^,t2)  becomes 

-1 
R(t.j )  c£(t.j=.tp)  and  G(t..  ,t2)  becomes  R  (t1 )  ^(t.j-tg).  Then  (Ec9)  becomes 


'x 


-[o.T] 


-T 


exp    <-     i 


f         z'(t)R"1(t)z(t)  dt 
oJ  I  (E.10) 


T  2 

-     *       /       II  £(t)  II      1 

[  °  £ 


=       exp     <-     i        /         ||  z(t)  ||      «         dt 

(t)  (E.11) 
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Probability  density  functionals  are  a  powerful  tool  when  used  for 
problems  involving  Gaussian  random  processes  on  a  finite  time  interval 
and  will  be  used  in  Chapter  III  of  this  report. 
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